Role :- Kubernetes Platform Administrator (L3) Location :- Remote, USA
JD :
Roles & Responsibilities: Kubernetes Platform Administrator (On-Premises) JD: Kubernetes Platform Administrator (On-Premises) responsible for the full lifecycle and operation of our core on-premises, vanilla Kubernetes platform. The Administrator is the technical owner of the entire infrastructure stack, managing everything from the Dell physical servers up through the application delivery and security layers. This position requires deep, hands-on expertise in complex systems integration, security hardening, and mandatory automation skills. Core Responsibilities The Administrator will be responsible for deploying, maintaining, and operating the following critical components: * Platform Operations: Managing the deployment, scaling, and maintenance of vanilla Kubernetes clusters. This includes overseeing the full Kubernetes upgrade path, managing the container runtime (containerd), and implementing disaster recovery using Velero. * Automation: Developing and maintaining all infrastructure-as-code. Expert-level proficiency in Ansible, Shell scripting, and Python is mandatory for configuration management, automated deployments, and managing in-house applications. * Security & Identity: Implementing and enforcing platform security. This involves managing cluster authentication with Dex, handling secrets via HashiCorp Vault, integrating with our KMS (VIPER), and ensuring governance through policy engines like Gatekeeper, OPA, and Pod Security Policies (PSPs). * Networking & Load Balancing: Configuring and troubleshooting the networking stack (Calico) and managing bare-metal load balancing solutions (MetalLB). * Storage Management: Integrating and maintaining enterprise-grade storage arrays using CSI (Container Storage Interface) drivers, specifically working with Dell Isilon and Infinidat storage systems. * Observability: Maintaining the comprehensive monitoring and logging systems, including the PrometheGrafana stack for metrics and alerting, and the ELK (Elasticsearch, Logstash, Kibana) stack for centralized logging. * Infrastructure Management: Hands-on management of the underlying physical and virtual infrastructure, including Dell physical servers, the Ubuntu OS, TPM module integration, and KVM virtualization. * Advanced Capabilities: Managing specialized hardware, including the operational provisioning and lifecycle of GPU nodes for high-performance computing workloads. * DevOps Workflow: Maintaining and optimizing our GitLab CI/CD pipelines and managing source control (SCM). * Open Source Lifecycle: Owning the complete lifecycle management (patching, configuration, upgrades) of all integrated open-source components. Essential Technical Skillset The incumbent must demonstrate proven, hands-on expertise in the following: * Mandatory Automation: Full mastery of Ansible, Shell Scripting, and Python for infrastructure automation and management. * Container Fluency: Deep knowledge of Docker commands is required for effective container inspection, debugging, and image troubleshooting. * Go Language: A basic understanding of Go language development is necessary to facilitate reading, debugging, and reviewing open-source Kubernetes components and utility scripts written in Go. * Security Stack: Proven operational experience with Vault, Dex, and Gatekeeper/OPA. * Bare-Metal Focus: Demonstrated experience configuring and managing Kubernetes components like Calico and MetalLB in a bare-metal environment.