Overview
Skills
Job Details
Role :- Senior Kubernetes Engineer
Location :- Remote (United States)
Job Description :-
Job Summary:
We're hiring a Kubernetes Engineer to design, secure, and operate enterprise-grade, multi-cloud Kubernetes across major providers. You'll enable a compliant, multi-tenant platform with "secure-by-default" controls, integrate cluster operations into Git-based delivery, and support a growing internal developer portal/software catalog and a unified repo for platform assets. The ideal candidate combines deep Kubernetes expertise with strong security, reliability, and audit readiness suitable for healthcare environments.
Key Responsibilities:
Design & Operations
- Architect, deploy, upgrade, and scale managed clusters across major clouds with HA control planes and cluster/workload autoscaling.
- Operate private clusters: restrict control-plane exposure, enforce private API access, use NAT-only egress, and implement approved private connectivity patterns.
- Engineer multiple node groups for workload separation and efficiency, heterogeneous instance families, GPU/CPU pools, spot vs on-demand, taints/tolerations, topology spread, and security-focused pools.
- Define golden patterns for services/ingress, storage classes, private networking/egress, and cloud load-balancing options.
Configuration & Delivery
- Maintain declarative manifests and templates, structure environment overlays and reusable modules.
- Enable progressive delivery via Git-based workflows with automated policy checks and promotion gates.
Security & Compliance
- Enforce least-privilege RBAC, namespace isolation, Pod Security standards, and admission policies for image provenance, non-root, and blocked capabilities.
- Implement service-to-service encryption with workload identity, certificate issuance/rotation, and policy-based authorization.
- Apply deny-by-default network policies, strong secrets hygiene with KMS-backed encryption and rotation, and signed/scan-gated images with SBOM attestations.
- Ensure audit-ready logging across control and data planes, route to central logging with detections for risky actions and configuration drift.
Observability & Resilience
- Integrate metrics, logs, traces, and events, define SLOs/error budgets, scale via reliable signals (including custom/external metrics).
- Build self-healing runbooks, conduct chaos/resiliency drills, and implement backup/restore for cluster state and application data.
Governance & Cloud Hygiene
- Apply org guardrails, allowed-regions, tagging/labeling standards, and automated conformance with remediation.
- Document RTO/RPO tiers, test restores and failovers, maintain audit evidence and change traceability.
Required Skills & Qualifications
- Kubernetes Expertise: Operating managed clusters on major clouds, scheduler and node lifecycle, cluster and workload autoscaling.
- Private Cluster Operations: Private API endpoints, restricted API access, NAT egress, bastion workflows, and private connectivity (peering/VPN/dedicated circuits).
- Multiple Node Groups: Designing heterogeneous pools, taints/tolerations, topology spread, and right-sizing for cost and performance.
- mTLS & Service Identity: Implementing workload identity, certificate issuance/rotation, policy-based service authorization, and end-to-end encryption in transit.
- Manifests & Packaging: YAML proficiency, templating/overlays, Git-based release strategies, and GitOps practices.
- Security Depth: RBAC design, Pod Security standards, admission policy engines, network policies, secrets management, image signing and vulnerability scan gates.
- Networking: CNI fundamentals, L4/L7 traffic, ingress/egress, private endpoints, and cross-cloud load-balancing options.
- Multi-Tenancy: Namespace boundaries, quotas/limits, noisy-neighbor mitigation, and sensitive-workload isolation.
- Infrastructure as Code: Clusters and cloud resources as code with policy guardrails and drift detection and IaC tools.
- Observability & Troubleshooting: Metrics/logs/traces, HPA/VPA using trustworthy signals, deep debugging of runtime, DNS/CNI, scheduling, and control-plane issues.
- Compliance Mindset (Healthcare): Understanding HIPAA/HITRUST concepts, encryption at rest/in transit, least-privilege, audit evidence, and governed deployment pipelines.
- Nice to Have: Internal developer portals/service catalogs, progressive delivery, cost-aware right-sizing and capacity forecasting, DR design, and scripting in Bash/Python.