Overview
Skills
Job Details
Role : Senior Azure SRE With System Admin & HPC Experience
Location : Seattle, WA Onsite
Duration : Full Time
pranayatburgeonitsdotcom
Mandate Skills : System Admin, HPC (Need at least 3+ years exp) & Scripting (PowerShell, Azure Bash, Go or Python)
Job Description
We are seeking a highly skilled Service Engineer having 7 to 10 Years of experience with strong infrastructure expertise and a deep understanding of High-Performance Computing (HPC) environments. The ideal candidate will have proven experience in managing complex systems, driving third-party application integration, and ensuring seamless operations across distributed computing platforms. This role requires exceptional problem-solving ability, strong communication skills, and a proactive approach to onboarding and supporting HPC workloads.
Required Skills & Qualifications
- Development Experience: Minimum 3 years in software development (PowerShell, Azure Bash, Go or Python).
- Administration Experience: 7+ years in system administration with a focus on infrastructure in PLM and HPC environments.
- Very Strong Communication (Written and Oral), Collaboration, driving skills.
- Proactively managing the backlog and bringing efficiencies through solution design and development.
Preferred Attributes
- Passion for HPC technologies and infrastructure optimization.
- Ability to learn and adapt to new tools and frameworks quickly.
- Strong analytical and problem-solving mindset.
- Collaborative approach with cross-functional teams.
Key Responsibilities
- Drive integration and operational support for third-party applications on HPC and PLM infrastructure.
- Manage and maintain Windows Server and Linux OS environments with VM and VMSS.
- Configure and optimize HPC schedulers such as Windows HPC Pack, OpenPBS, Slurm, or similar.
- Support distributed computing workloads, including MPI-based applications.
- Troubleshoot and optimize networking components, focusing on TCP/IP, name resolution, and RDMA.
- Implement and manage Azure Cycle Cloud and Azure Batch for HPC workload orchestration.
- Collaborate with application owners to understand infrastructure dependencies and optimize performance.
- Ensure compliance with security and operational best practices across all systems