Overview
Hybrid
Depends on Experience
Contract - W2
Skills
SRE
Site Reliability Engineer
kubernetes
python
shell
opensift
linux
azure
Job Details
Position : Site Reliability Engineer
Location : Plano, TX/ Chandler, AZ/ Chicago, IL - Hybrid (3 days onsite)
Duration : Long Term
Job Summary:
Primary skills:
OpenShift, Rancher Kubernetes(RKE), Python and Shell Scripting, Linux, and Azure Cloud.
Responsibilities
Responsible for reliability and support of Container Platform on-prem and external clouds (Azure /AWS /Google)
Monitor and troubleshoot Container platform (Openshift), Rancher (RKE) and Azure (AKS) environment performance issues, connectivity issues, security issues, etc.
Perform deep dives into systemic and latent reliability issues, Incident management, problem management
Identifying, analyzing, and resolving infrastructure vulnerabilities and application deployment issues.
Perform blameless RCA, partner with engineering and operation teams across the organization to roll out fixes.
Responsible for application onboarding and provide troubleshooting support through the lifecycle of the applications on the container platform.
Identify and drive opportunities to improve automation to reduce TOIL and improve operational excellence.
Partner with risk, and compliance teams to bring visibility and implement right controls and remediation of vulnerabilities.
Ensure resiliency during implementation and identify/fix resiliency problems by collaborating with engineering teams.
Be a key stakeholder in the design of cloud services and work with Architecture, engineering, product teams
Participate in 24x7 on-call coverage follow the sun model
Requirements
BS /MS degree in Computer Science or related technical field involving systems or equivalent practical experience.
Minimum 5+ years of hands-on experience supporting Kubernetes /Openshift / RKE / EKS Container platform.
Experience with Python, Ansible, Golang, and shell scripting
Kubernetes /Openshift /Terraform certifications are a plus
Strong experience in major services related to Compute, Storage, Network and Security
Experience with monitoring tools like Prometheus and Dynatrace, as well as cloud native tools like Azure Monitor and Log Analytics
Strong understanding and background of working with a complex IAM infrastructure, including Active Directory, Azure AD Connect, Azure AD, and Ping Identity or other SSO solutions.
Advanced knowledge of Linux OS, DNS, DHCP, Kerberos and Windows Authentication
Experience with CI/CD tools git /Jenkins, GitOps model
Excellent understanding of Linux /Windows operating systems administration
Experience in Container security and vulnerability remediation.
Systematic problem-solving approach, sense of ownership and drive
Ability to juggle competing priorities and adapt to changes in project scope.
Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
Proven ability to work independently with minimal supervision and as part of a team with direct responsibilities.
Experience in Openshift, RKE, CSP Kubernetes services such as AKS and EKS
Experience in Terraform, ArgoCD, Tekton, and K-native technologies.
Experience in agile deployment methodologies (GitOps)
Knowledge of various container runtimes
Familiarity with the operator deployment pattern.
Experience working in a highly available multi-datacenter environment
Experience working with monitoring tools such as Prometheus, Splunk, Dynatrace, Sysdig, or similar tools.
Understanding of cost management, inventory management, FinOps model
Thanks & Regards
MD TOUHEED ALAM
SR. TECHNICAL RECRUITER
PURPLE HIRES INC.
Email -
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.