Overview
Skills
Job Details
Staff Kubernetes Engineer
Location: Remote
We are looking for someone who is experienced in building lasting relationships and is passionate about making meaningful contributions to our team. Don't be mistaken, this is a challenging career path but also highly rewarding. Are you up for the challenge? If so, stop reading and start applying.
We are looking for an experienced Staff Engineering Contractor to strengthen operational excellence within the Kubernetes-based Compute Platform. In this role, you will collaborate closely with internal engineering teams to address operational challenges and improve the reliability, efficiency, and usability of the platform. Strong Kubernetes operational expertise is required to proactively identify issues, recommend improvements, and implement best practices.
What You ll Do
Operational Partnership and Issue Resolution
Act as the primary operational expert, working directly with engineering teams to diagnose and resolve complex issues within the Kubernetes environment.
Continuous Improvement
Analyze recurring operational issues to identify root causes and patterns, then propose and implement strategic improvements to processes, tooling, and systems to increase reliability and efficiency.
Workload Resource Optimization
Partner with EKS technical leads to run experiments in development environments to accurately determine workload resource requirements.
Configuration Automation
Develop and implement automation for EKS workload configurations to ensure consistent operation within defined and acceptable parameters.
Alerting and Notification Standardization
Define and standardize alerts and notifications for all EKS workloads, creating a structured taxonomy based on compute patterns (for example, web services, event processing, and API servers).
Platform Governance and Standards
Participate in defining and documenting EKS standards and develop policies to enforce a consistent and opinionated operational model for the compute platform.
What We Look For
Deep Kubernetes Expertise
Extensive experience operating and managing large-scale Kubernetes clusters.
Operational Excellence
Proven ability to troubleshoot and resolve complex operational issues across Kubernetes and underlying infrastructure.
Cloud Infrastructure and Observability
Strong knowledge of cloud operations, the Kubernetes ecosystem, observability tools (such as Prometheus, Grafana, CloudWatch, and ELK), Infrastructure as Code (Terraform), and CI/CD practices (such as ArgoCD and Buildkite).
Experience with AWS or another major cloud provider is essential.
Programming Skills
Proficiency in Python, Go, or Bash.
Communication and Collaboration
Strong ability to work effectively with diverse engineering teams and communicate technical concepts clearly.
Autonomy and Documentation
Ability to independently drive operational improvements and clearly document processes for broad team adoption.