Description Role Overview
We are seeking a skilled Site Reliability Engineer (SRE) to join our team and help build, maintain, and scale cloud-native infrastructure in Microsoft Azure. This role partners closely with development and operations teams to ensure systems are reliable, scalable, secure, and cost-efficient.
The ideal candidate is passionate about automation, infrastructure-as-code, GitOps, and observability, and thrives in a collaborative, fast-paced environment. You will play a critical role in improving system resilience and establishing strong SRE practices from the ground up.
WKey Responsibilities
- Design, implement, and manage Azure cloud infrastructure using Terraform and Terragrunt
- Maintain, operate, and optimize Kubernetes clusters on Azure Kubernetes Service (AKS)
- Build and manage CI/CD pipelines using GitHub Actions / GitHub Workflows
- Implement GitOps-based deployments using ArgoCD
- Enhance system reliability by implementing monitoring, alerting, and observability solutions using Grafana
- Automate operational tasks to reduce toil and improve team efficiency
- Participate in on-call rotations, incident response, root cause analysis, and post-mortems
- Partner with development teams to improve application performance, scalability, and resilience
- Implement and promote SRE best practices, including:
- Service Level Indicators (SLIs)
- Service Level Objectives (SLOs)
- Error budgets
- Continuously improve system performance, security posture, and cloud cost efficiency
Requirements Required Skills & Qualifications
Experience
- 3+ years of experience in an SRE, DevOps, or Cloud Infrastructure role
Cloud & Infrastructure
- Strong hands-on experience with Microsoft Azure
- Infrastructure-as-Code experience using Terraform and Terragrunt
- Experience designing and managing cloud-native environments
Containers & Orchestration
- Proficiency with Kubernetes (preferably AKS)
- Experience supporting containerized workloads and orchestration patterns
- Exposure to Databricks environments is required
CI/CD & GitOps
- Experience with GitHub Actions / GitHub Workflows
- Hands-on experience with ArgoCD and GitOps-based deployment strategies
Observability
- Solid understanding of Grafana
- Experience with Prometheus is a plus
- Familiarity with Loki and Tempo is a plus
Programming
- Hands-on experience with Java in a production or platform context
Technology Doesn't Change the World, People Do.
Robert Half is the world's first and largest specialized talent solutions firm that connects highly qualified job seekers to opportunities at great companies. We offer contract, temporary and permanent placement solutions for finance and accounting, technology, marketing and creative, legal, and administrative and customer support roles.
Robert Half works to put you in the best position to succeed. We provide access to top jobs, competitive compensation and benefits, and free online training. Stay on top of every opportunity - whenever you choose - even on the go. Download the Robert Half app and get 1-tap apply, notifications of AI-matched jobs, and much more.
All applicants applying for U.S. job openings must be legally authorized to work in the United States. Benefits are available to contract/temporary professionals, including medical, vision, dental, and life and disability insurance. Hired contract/temporary professionals are also eligible to enroll in our company 401(k) plan. Visit roberthalf.gobenefits.net for more information.
2025 Robert Half. An Equal Opportunity Employer. M/F/Disability/Veterans. By clicking "Apply Now," you're agreeing to Robert Half's Terms of Use and Privacy Notice.