Responsibilities:
• Support day-to-day operations of an enterprise Kubernetes platform (100+ clusters, ~50% production)
• Perform routine operational tasks including cluster maintenance, upgrades, patching, health checks, and capacity management
• Troubleshoot and resolve Kubernetes platform issues impacting cluster or application availability
• Participate in incident response, root-cause analysis, and post-incident reviews
• Act as a backup platform engineer to enable on-call rotation and reduce key-person dependency
• Provide after-hours support as part of a shared on-call rotation
• Serve as a secondary escalation point for critical Production issues
• Assist internal application teams with Kubernetes-related questions and issues
• Support common Kubernetes constructs such as Pods, Deployments, Services, Ingress, ConfigMaps, and Secrets
• Help teams troubleshoot networking, DNS, ingress, certificate, and resource-related issues
• Review application configurations for Kubernetes best practices and platform alignment
• Work with integrated enterprise tools such as:
• Ingress controllers (e.g., Contour / Envoy)
• Logging platforms (e.g., Fluent Bit, centralized log aggregation)
• Monitoring/observability tools (e.g., Dynatrace or similar)
• Container registries (e.g., Harbor, JFrog, etc)
• Help document operational procedures, runbooks, and troubleshooting guides
• Share Kubernetes knowledge and best practices with internal teams
• Assist in improving platform resiliency, operational maturity, and supportability