Job Summary The Senior Site Reliability Engineer (Azure) will help build, maintain, and scale cloud-native infrastructure in a fast-paced, collaborative environment. This role works closely with development and operations teams to ensure systems are reliable, efficient, automated, and secure. Responsibilities include designing Azure cloud environments, managing Kubernetes clusters, implementing Infrastructure-as-Code through Terraform/Terragrunt, improving CI/CD pipelines, enhancing system observability, and providing support for on-call and incident response activities. Key Responsibilities Design, implement, and maintain Azure cloud infrastructure using best practices for scalability and reliability. Manage and optimize Kubernetes clusters (preferably AKS) and containerized workloads. Build and maintain Infrastructure-as-Code solutions using Terraform and Terragrunt. Develop, maintain, and enhance CI/CD pipelines using GitHub Workflows/Actions and ArgoCD. Support Databricks environments and associated cloud integrations. Implement and improve observability using tools such as Grafana, Prometheus, Loki, and Tempo. Automate operational tasks to improve efficiency, reduce manual work, and enhance reliability. Participate in on-call rotations, incident response, root-cause analysis, and remediation activities. Collaborate with developers to improve application performance, reliability, and adherence to SRE practices such as SLIs and SLOs. Identify opportunities for cost optimization, performance improvements, and infrastructure security enhancements. Required Qualifications Minimum 4 years of experience in Site Reliability Engineering, DevOps, or cloud infrastructure roles. Strong hands-on experience with Azure cloud services. Proficiency with Java and Infrastructure-as-Code tools including Terraform and Terragrunt. Strong experience with Kubernetes (preferably AKS) and container orchestration. Experience working with Databricks in production environments. Proficiency with CI/CD tooling, especially GitHub Workflows/Actions and ArgoCD. Strong understanding of observability tooling, including Grafana (Prometheus, Loki, Tempo preferred). Ability to collaborate in cross-functional environments and communicate effectively. Preferred Qualifications Masters degree in Computer Science or a related field. Education: Bachelors Degree, Masters Degree
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
- Dice Id: compun
- Position Id: RefCompId:5716474
- Posted 4 hours ago