Senior Rancher Platform Engineer with deep, hands-on experience designing, building, and operating Rancher-managed Kubernetes platforms (RKE / RKE2) for a large enterprise initiative.
This role requires a detail‑oriented, hands-on practitioner who will serve as a Rancher Subject Matter Expert (SME), providing technical leadership, architectural guidance, and operational
best practices across multi‑cluster, hybrid, and on‑prem environments.
Key Responsibilities
Design, deploy, and operate Rancher-managed Kubernetes clusters (RKE / RKE2) across on‑prem and hybrid environments.
Architect and maintain highly available, scalable, and secure Kubernetes platforms using Rancher best practices.
Install, configure, upgrade, patch, and decommission Kubernetes clusters using Rancher UI, APIs, and automation workflows.
Implement multi-cluster governance using Rancher Projects, Namespaces, RBAC, and global policies.
Integrate Rancher with enterprise identity providers (Active Directory / LDAP / Azure AD / SSO).
Manage and troubleshoot control plane, etcd, node, networking (CNI), and storage (CSI) issues in production clusters.
Perform root cause analysis (RCA) for cluster outages and platform incidents, and implement preventive improvements.
Implement and maintain observability stacks including Prometheus, Grafana, and centralized logging (EFK/ELK).
Support application onboarding, Helm-based deployments, and standardized platform patterns for development teams.
Collaborate closely with security, infrastructure, and application teams to enforce platform security and compliance.
Act as a Rancher SME, providing hands-on guidance, troubleshooting support, and architectural recommendations.
Create and maintain architecture diagrams, operational runbooks, standards, and platform documentation.
Support OpenShift clusters as needed, primarily from an integration and interoperability perspective.
Required Skills & Experience
Strong hands-on expertise with Rancher managing Kubernetes clusters using RKE and RKE2 (mandatory).
Deep understanding of Kubernetes architecture, internals, and day‑2 operations
- Proven experience managing:
- HA control planes and etcd
- Node lifecycle (provisioning, scaling, replacement, decommissioning)
- Kubernetes upgrades and patching with minimal downtime
- Experience with Kubernetes networking (Calico, Cilium, ingress controllers, load balancing).
- Experience with persistent storage and CSI drivers (NFS, cloud disks, Ceph, Longhorn, etc.).
- Hands-on experience with CI/CD and GitOps workflows (Helm, Jenkins, GitHub Actions, Argo CD).
- Experience supporting mixed Linux and Windows Kubernetes worker nodes.
- Strong troubleshooting skills with the ability to work through complex, production-impacting issues.
- Excellent communication skills, with the ability to explain technical concepts to both technical and non-technical stakeholders.
- Ability to work independently while also collaborating effectively within cross-functional teams.
- Experience with Red Hat OpenShift, particularly in environments environments (banking, healthcare, pharma).