Overview
Skills
Job Details
Title: Site Reliability Engineer
Location: Charlotte, NC Plano, TX Pennington, NJ
Hybrid: 3 Days Onsite (Mandatory)
Interview: Mandatory Onsite Interview
Shift: 9AM 5PM EST Must be flexible to work Tue Sat or Sun Thu (rotation every 2 3 months)
Duration: 12+ Months (Likely Extension)
Top Required Skills (Must Have)
- Red Hat OpenShift / Kubernetes (5+ years in Enterprise Environment)
- Azure / AKS (Strong Cloud Experience)
- Terraform
- Python
Job Description
Client is seeking a highly skilled Site Reliability Engineer (SRE) to join a mission-critical infrastructure engineering team supporting Kubernetes, OpenShift, cloud platforms, and automation frameworks.
You will be responsible for ensuring uptime, reliability, performance, and scalability of the container and cloud platforms. The role requires deep technical expertise across Kubernetes, cloud-native tools, Linux engineering, Terraform, Python automation, and enterprise-scale system reliability.
Responsibilities
- Maintain and support large-scale Container Platforms (OpenShift, Kubernetes, RKE, AKS, EKS, GKE) across on-prem and cloud.
- Monitor and troubleshoot performance, security, networking, and deployment issues.
- Perform incident and problem management, conducting blameless RCAs.
- Analyze and remediate vulnerabilities in container environments.
- Collaborate with engineering, cloud, SRE, and operations teams.
- Conduct deep-dive investigations into systemic reliability issues.
- Implement automated solutions using Python, Ansible, Golang, Shell.
- Support CI/CD pipelines including Git, Jenkins, and GitOps frameworks.
- Manage IAM components including Active Directory, Azure AD, SSO / Ping Identity.
- Provide Linux/Windows administration support across hybrid environments.
- Utilize monitoring tools like Prometheus, Dynatrace, Splunk, Azure Monitor.
Required Skills
- 5+ years hands-on experience with Kubernetes, OpenShift, RKE, AKS, EKS.
- Strong knowledge of Python, Ansible, Golang, Shell Scripting.
- Advanced Linux, DNS, DHCP, Kerberos, Windows Authentication expertise.
- Strong cloud experience: Azure, AWS, or Google Cloud Platform.
- Experience with Terraform and CI/CD (Git, Jenkins, GitOps).
- Experience with container security, vulnerability remediation, FinOps awareness.
- Strong troubleshooting, problem-solving, and reliability engineering skills.
- Excellent communication and ability to work independently.