Senior Kubernetes Platform Engineer

Hybrid in Charlotte, NC, US • Posted 8 hours ago • Updated 8 hours ago

Contract Independent

Contract W2

No Travel Required

Hybrid

$50 - $55/hr

Fitment

Dice Job Match Score™

👤 Reviewing your profile...

Job Details

Skills

Kubernetes
Machine Learning (ML)
Network Security
Terraform
Python
Generative Artificial Intelligence (AI)
Engineering Design
Continuous Integration
Computer Networking
Cloud Computing
Artificial Intelligence
Amazon Web Services
Amazon EFS
DevOps
Gap Analysis
Google Cloud Platform
Infrastructure Architecture
Good Clinical Practice
Lifecycle Management
Linux
Network
Storage
Workflow
Open Source
Microservices

Summary

Senior Kubernetes Platform Engineer

ML / GenAI Infrastructure | Terraform | Cloud-Native

In Person Interview is Non-Negotiable

Location: Charlotte, NC - On-Site/Hybrid

Employment Type: Contract-to-Hire

Experience: 7–12 Years (5+ years hands-on Kubernetes)

Industry: Enterprise AI / Cloud Infrastructure

⸻

About the Role

We are looking for a Senior Kubernetes Platform Engineer to design, build, and operate mission-critical Kubernetes infrastructure that powers large-scale Machine Learning (ML) and Generative AI (GenAI) workloads.

This is not a standard Kubernetes admin role — you will act as a subject matter expert, driving architecture decisions across scheduling, networking, security, storage, and multi-tenancy. You will work closely with ML engineers, researchers, and application teams to build scalable, GPU-optimized platforms that accelerate AI innovation.

⸻

Key Responsibilities

Kubernetes Platform Engineering

• Design, deploy, and manage multi-cluster Kubernetes environments (EKS, GKE, AKS)

• Build advanced Kubernetes components including CRDs, Operators, admission webhooks, and custom schedulers

• Optimize Kubernetes for GPU workloads (NVIDIA device plugins, MIG, time-slicing)

• Implement autoscaling solutions (HPA, VPA, KEDA, Cluster Autoscaler)

• Enforce security using RBAC, OPA/Gatekeeper, and Pod Security Standards

• Manage service mesh (Istio / Linkerd) for secure and observable microservices

• Configure networking (Cilium, Calico), ingress controllers, and network policies

• Lead cluster lifecycle management (upgrades, backups, disaster recovery)

• Package platform components using Helm and Kustomize

⸻

ML / GenAI Infrastructure

• Design ML pipelines using Kubeflow, Argo Workflows, or Ray

• Build scalable model serving platforms (KServe, Triton, TorchServe, vLLM)

• Optimize distributed compute using Ray on Kubernetes

• Design storage solutions for ML datasets and artifacts (EFS, GCS, NFS, etc.)

• Enable GPU-backed environments (JupyterHub, Kubeflow Notebooks)

• Deploy and manage vector databases for RAG applications

• Optimize LLM inference (batching, caching, multi-GPU scaling)

⸻

Infrastructure as Code (Terraform)

• Develop and maintain reusable Terraform modules for cloud infrastructure

• Implement remote state management and multi-environment workflows

• Enforce best practices: versioning, drift detection, policy-as-code

• Integrate Terraform into CI/CD pipelines and GitOps workflows

• Use tools like Atlantis or Terraform Cloud for automated deployments

⸻

Observability, Security & Reliability

• Build observability stack (Prometheus, Grafana, Loki, Jaeger/Tempo)

• Implement audit logging and runtime security (Falco, SIEM integration)

• Define SLOs/SLIs and maintain platform reliability

• Perform GPU capacity planning and cost optimization

• Lead incident response and post-mortem analysis

⸻

Required Skills & Technologies

• Kubernetes (Expert level)

• Terraform (Advanced)

• Helm / Kustomize

• AWS / Google Cloud Platform / Azure (EKS, GKE, AKS)

• Istio / Linkerd

• Argo Workflows / Kubeflow / Ray

• KServe / Triton

• Prometheus / Grafana

• Cilium / Calico

• OPA / Gatekeeper

• NVIDIA GPU Operator

• Docker / containerd

• GitOps tools (ArgoCD / Flux)

• Python / Go / Bash

• Linux systems and networking

⸻

Required Qualifications

• 7+ years in cloud/platform engineering

• 5+ years hands-on Kubernetes in production

• Deep understanding of Kubernetes internals (control plane, CNI, CSI, etc.)

• Experience running GPU-based ML/AI workloads at scale

• Strong Terraform expertise (modules, CI/CD, multi-cloud)

• Experience with ML orchestration tools (Kubeflow, Argo, or Ray)

• Proficiency in at least one programming language (Python, Go, or Bash)

• Experience with GitOps and secure container practices

⸻

Preferred Qualifications

• CKA (Certified Kubernetes Administrator) — Required

• CKS (Certified Kubernetes Security Specialist) — Preferred

• CKAD certification

• Cloud DevOps certifications (AWS / Google Cloud Platform)

• Terraform certification

• Experience with Crossplane or multi-cluster management

• Familiarity with eBPF tools (Hubble, Pixie)

• Contributions to CNCF or open-source Kubernetes ecosystem

⸻

What You’ll Deliver (First 90 Days)

• Day 30: Audit existing Kubernetes clusters and deliver a gap analysis

• Day 60: Implement Terraform-managed clusters with security and observability

• Day 90: Deploy production-ready model serving platform with SLO dashboards

⸻

Who You Are

• A systems thinker with a strong platform mindset

• Proactive and automation-driven

• Comfortable working cross-functionally with ML and engineering teams

• Influential communicator who can drive architecture decisions

• Security-focused and reliability-driven

⸻

Why Join Us

This role is ideal for engineers passionate about Kubernetes and AI infrastructure who want to build the backbone of next-generation enterprise AI platforms.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: PTP3cveBqlyXCso
Position Id: 8927720
Posted 8 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Senior Kubernetes Platform Engineer

Remote or Hybrid in Charlotte, North Carolina

•

Today

Job Title: Senior Kubernetes Platform Engineer Level: Senior / Lead (IC5 equivalent) Employment Type: W2 Contract Location: Remote / Hybrid US Experience :712 Years (5+ years hands-on Kubernetes) Industry Domain: Enterprise AI / Cloud Infrastructure ABOUT THE ROLE We are seeking a highly skilled Senior Kubernetes Platform Engineer to design, build, and operate mission-critical Kubernetes-based infrastructure that powers Machine Learning (ML) training, inference, and GenAI workloads at enterpr

Easy Apply

Contract

Senior Kubernetes Platform Engineer

Fayetteville, North Carolina

•

Today

Job Number: R0234039 Kubernetes Platform Engineer, Senior The Opportunity: Everyone is trying to "harness the cloud," but not everyone knows how. As a Kubernetes Platform Engineer, you will be responsible for operating and maintaining distributed and managed platform services, including automating the installation, configuration, and ongoing support of system components. You will tune systems for performance, reliability, and efficiency, plan and manage sof tware releases and feature deployment

Full-time

USD 86,800.00 - 198,000.00 per year

Kubernetes Administrator

Charlotte, North Carolina

•

Yesterday

Senior Kubernetes Administrator (AKS / Azure / Terraform) Location:Charlotte, NC Duration:6 Months+ Job Description We are seeking a Senior Kubernetes Administrator with deep, hands-on experience in administering and operating Azure Kubernetes Service (AKS) clusters in production environments. The ideal candidate will be responsible for designing, deploying, securing, scaling, and troubleshooting AKS clusters, leveraging Azure-native services and Infrastructure as Code (Terraform). Th

Easy Apply

Contract, Third Party

$60 - $70

Azure Cloud Platform Engineer

Charlotte, North Carolina

•

Today

Outstanding contract opportunity! A well-known Financial Services Company is looking for an Azure Cloud Platform Engineer in Chandler, AZ or Charlotte, NC (hybrid 3 days on-site). Contract Duration: 6+ Months Required Skills & Experience 5+ years of Systems Engineering, Cloud Engineering, or Technology Architecture experience (or equivalent combination of work experience, military experience, training, or education). 5+ years of experience in IT infrastructure or cloud platform support roles. 3

Easy Apply

Contract

$69.5 - $76.16

Search all similar jobs