Senior Kubernetes Platform Engineer

Hybrid in Charlotte, NC, US • Posted 8 hours ago • Updated 8 hours ago
Contract Independent
Contract W2
No Travel Required
Hybrid
$50 - $55/hr
Fitment

Dice Job Match Score™

👤 Reviewing your profile...

Job Details

Skills

  • Kubernetes
  • Machine Learning (ML)
  • Network Security
  • Terraform
  • Python
  • Generative Artificial Intelligence (AI)
  • Engineering Design
  • Continuous Integration
  • Computer Networking
  • Cloud Computing
  • Artificial Intelligence
  • Amazon Web Services
  • Amazon EFS
  • DevOps
  • Gap Analysis
  • Google Cloud Platform
  • Infrastructure Architecture
  • Good Clinical Practice
  • Lifecycle Management
  • Linux
  • Network
  • Storage
  • Workflow
  • Open Source
  • Microservices

Summary

Senior Kubernetes Platform Engineer
 
ML / GenAI Infrastructure | Terraform | Cloud-Native
 
In Person Interview is Non-Negotiable 
 
Location:     Charlotte, NC   -  On-Site/Hybrid 
Employment Type: Contract-to-Hire
Experience: 7–12 Years (5+ years hands-on Kubernetes)
Industry: Enterprise AI / Cloud Infrastructure
 
 
About the Role
 
We are looking for a Senior Kubernetes Platform Engineer to design, build, and operate mission-critical Kubernetes infrastructure that powers large-scale Machine Learning (ML) and Generative AI (GenAI) workloads.
 
This is not a standard Kubernetes admin role — you will act as a subject matter expert, driving architecture decisions across scheduling, networking, security, storage, and multi-tenancy. You will work closely with ML engineers, researchers, and application teams to build scalable, GPU-optimized platforms that accelerate AI innovation.
 
 
Key Responsibilities
 
Kubernetes Platform Engineering
• Design, deploy, and manage multi-cluster Kubernetes environments (EKS, GKE, AKS)
• Build advanced Kubernetes components including CRDs, Operators, admission webhooks, and custom schedulers
• Optimize Kubernetes for GPU workloads (NVIDIA device plugins, MIG, time-slicing)
• Implement autoscaling solutions (HPA, VPA, KEDA, Cluster Autoscaler)
• Enforce security using RBAC, OPA/Gatekeeper, and Pod Security Standards
• Manage service mesh (Istio / Linkerd) for secure and observable microservices
• Configure networking (Cilium, Calico), ingress controllers, and network policies
• Lead cluster lifecycle management (upgrades, backups, disaster recovery)
• Package platform components using Helm and Kustomize
 
 
ML / GenAI Infrastructure
• Design ML pipelines using Kubeflow, Argo Workflows, or Ray
• Build scalable model serving platforms (KServe, Triton, TorchServe, vLLM)
• Optimize distributed compute using Ray on Kubernetes
• Design storage solutions for ML datasets and artifacts (EFS, GCS, NFS, etc.)
• Enable GPU-backed environments (JupyterHub, Kubeflow Notebooks)
• Deploy and manage vector databases for RAG applications
• Optimize LLM inference (batching, caching, multi-GPU scaling)
 
 
Infrastructure as Code (Terraform)
• Develop and maintain reusable Terraform modules for cloud infrastructure
• Implement remote state management and multi-environment workflows
• Enforce best practices: versioning, drift detection, policy-as-code
• Integrate Terraform into CI/CD pipelines and GitOps workflows
• Use tools like Atlantis or Terraform Cloud for automated deployments
 
 
Observability, Security & Reliability
• Build observability stack (Prometheus, Grafana, Loki, Jaeger/Tempo)
• Implement audit logging and runtime security (Falco, SIEM integration)
• Define SLOs/SLIs and maintain platform reliability
• Perform GPU capacity planning and cost optimization
• Lead incident response and post-mortem analysis
 
 
Required Skills & Technologies
• Kubernetes (Expert level)
• Terraform (Advanced)
• Helm / Kustomize
• AWS / Google Cloud Platform / Azure (EKS, GKE, AKS)
• Istio / Linkerd
• Argo Workflows / Kubeflow / Ray
• KServe / Triton
• Prometheus / Grafana
• Cilium / Calico
• OPA / Gatekeeper
• NVIDIA GPU Operator
• Docker / containerd
• GitOps tools (ArgoCD / Flux)
• Python / Go / Bash
• Linux systems and networking
 
 
Required Qualifications
• 7+ years in cloud/platform engineering
• 5+ years hands-on Kubernetes in production
• Deep understanding of Kubernetes internals (control plane, CNI, CSI, etc.)
• Experience running GPU-based ML/AI workloads at scale
• Strong Terraform expertise (modules, CI/CD, multi-cloud)
• Experience with ML orchestration tools (Kubeflow, Argo, or Ray)
• Proficiency in at least one programming language (Python, Go, or Bash)
• Experience with GitOps and secure container practices
 
 
Preferred Qualifications
• CKA (Certified Kubernetes Administrator) — Required
• CKS (Certified Kubernetes Security Specialist) — Preferred
• CKAD certification
• Cloud DevOps certifications (AWS / Google Cloud Platform)
• Terraform certification
• Experience with Crossplane or multi-cluster management
• Familiarity with eBPF tools (Hubble, Pixie)
• Contributions to CNCF or open-source Kubernetes ecosystem
 
 
What You’ll Deliver (First 90 Days)
• Day 30: Audit existing Kubernetes clusters and deliver a gap analysis
• Day 60: Implement Terraform-managed clusters with security and observability
• Day 90: Deploy production-ready model serving platform with SLO dashboards
 
 
Who You Are
• A systems thinker with a strong platform mindset
• Proactive and automation-driven
• Comfortable working cross-functionally with ML and engineering teams
• Influential communicator who can drive architecture decisions
• Security-focused and reliability-driven
 
 
Why Join Us
 
This role is ideal for engineers passionate about Kubernetes and AI infrastructure who want to build the backbone of next-generation enterprise AI platforms.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: PTP3cveBqlyXCso
  • Position Id: 8927720
  • Posted 8 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote or Hybrid in Charlotte, North Carolina

Today

Easy Apply

Contract

65

Fayetteville, North Carolina

Today

Full-time

USD 86,800.00 - 198,000.00 per year

Charlotte, North Carolina

Yesterday

Easy Apply

Contract, Third Party

$60 - $70

Charlotte, North Carolina

Today

Easy Apply

Contract

$69.5 - $76.16

Search all similar jobs