Position Title: Infrastructure Specialist Cloud Engineer (HPC SME)
Location: Remote
Duration: 6+ Month Contract
Job Description:
Seeking an Infrastructure Specialist Cloud Engineer (SCE) with deep High Performance Computing (HPC) expertise to design, optimize, and operate large scale, performance critical workloads on Cloud. This role focuses on GPU/TPU accelerated infrastructure, high throughput storage, low latency networking, and secure AI/HPC platforms.
Key Responsibilities:
Design and operate HPC and AI infrastructure on Cloud, optimized for performance and scale
Build and manage GPU/TPU enabled GKE clusters for AI training and HPC workloads
Optimize storage and data pipelines for high throughput and low latency access
Support GKE based migrations of HPC workloads from on prem or other clouds
Implement CI/CD and Infrastructure as Code for HPC platforms
Ensure secure compute patterns for sensitive and regulated workloads
Collaborate with engineering teams on performance tuning and architecture reviews
Required Skills:
Strong experience in High Performance Computing (HPC) environments
Hands on expertise with Cloud infrastructure (Compute, Networking, Storage)
Advanced experience with GKE for large scale, performance sensitive workloads
Experience with GPU/TPU accelerated platforms for AI or scientific computing
Nice to Have:
Experience with high throughput storage and parallel file systems
Networking optimization for low latency and high bandwidth workloads
Exposure to Vertex AI or AI training infrastructure
Multi cloud HPC experience (on prem, hybrid, or multi cloud)
Experience:
7+ years in infrastructure, platform engineering, or HPC environments
Prior experience supporting Cloud or client engagements preferred