Job Description
Must Have - Rancher-managed Kubernetes clusters (RKE / RKE2), Rancher UI, APIs, and automation workflows, Networking, Observability stacks including Prometheus, Grafana, and centralized logging (EFK/ELK), CI/CD and GitOps workflows (Helm, Jenkins, GitHub Actions, Argo CD)
Senior Rancher Platform Engineer with deep, hands-on experience designing, building, and operating Rancher-managed Kubernetes platforms (RKE / RKE2) for a large enterprise initiative.
This role requires a detailoriented, hands-on practitioner who will serve as a Rancher Subject Matter Expert (SME), providing technical leadership, architectural guidance, and operational
best practices across multicluster, hybrid, and onprem environments.
Key Responsibilities
- Design, deploy, and operate Rancher-managed Kubernetes clusters (RKE / RKE2) across onprem and hybrid environments.
- Architect and maintain highly available, scalable, and secure Kubernetes platforms using Rancher best practices.
- Install, configure, upgrade, patch, and decommission Kubernetes clusters using Rancher UI, APIs, and automation workflows.
- Implement multi-cluster governance using Rancher Projects, Namespaces, RBAC, and global policies.
- Integrate Rancher with enterprise identity providers (Active Directory / LDAP / Azure AD / SSO).
- Manage and troubleshoot control plane, etcd, node, networking (CNI), and storage (CSI) issues in production clusters.
- Perform root cause analysis (RCA) for cluster outages and platform incidents, and implement preventive improvements.
- Implement and maintain observability stacks including Prometheus, Grafana, and centralized logging (EFK/ELK).
- Support application onboarding, Helm-based deployments, and standardized platform patterns for development teams.
- Collaborate closely with security, infrastructure, and application teams to enforce platform security and compliance.
- Act as a Rancher SME, providing hands-on guidance, troubleshooting support, and architectural recommendations.
- Create and maintain architecture diagrams, operational runbooks, standards, and platform documentation.
- Support OpenShift clusters as needed, primarily from an integration and interoperability perspective.
Required Skills & Experience
Strong hands-on expertise with Rancher managing Kubernetes clusters using RKE and RKE2 (mandatory).
Deep understanding of Kubernetes architecture, internals, and day2 operations
- Proven experience managing:
- HA control planes and etcd
- Node lifecycle (provisioning, scaling, replacement, decommissioning)
- Kubernetes upgrades and patching with minimal downtime
- Experience with Kubernetes networking (Calico, Cilium, ingress controllers, load balancing).
- Experience with persistent storage and CSI drivers (NFS, cloud disks, Ceph, Longhorn, etc.).
- Hands-on experience with CI/CD and GitOps workflows (Helm, Jenkins, GitHub Actions, Argo CD).
- Experience supporting mixed Linux and Windows Kubernetes worker nodes.
- Strong troubleshooting skills with the ability to work through complex, production-impacting issues.
- Excellent communication skills, with the ability to explain technical concepts to both technical and non-technical stakeholders.
- Ability to work independently while also collaborating effectively within cross-functional teams.
- Experience with Red Hat OpenShift, particularly in environments environments (banking, healthcare, pharma).