Overview
Skills
Job Details
Role: Azure Site Reliability Engineer Lead
Location: Ann Arbor, MI 48105
Duration: Long-Term
Job Description:
Roles & Responsibilities:
Implement and manage cloud infrastructure using Azure services.
Develop and maintain scripts in Bash, PowerShell, Python, and other programming languages.
Collaborate with multiple support groups to ensure seamless service delivery.
Troubleshoot complex issues across cloud services and application stacks.
Ensure security and compliance of cloud infrastructure.
Document processes and changes, adhering to agile development practices.
Continuously seek improvements in processes and infrastructure management.
Monitor system performance, reliability, and availability using tools like Azure Monitor, Application Insights, and Log Analytics.
Automate operational tasks and deployments using Azure DevOps, ARM templates, or terraform.
Conduct regular performance tuning and capacity planning for cloud resources.
Participate in incident response, root cause analysis, and post-incident reviews to enhance system resilience.
Experience Required: 8 years of experience managing and maintaining production workloads on Azure.
Proven expertise in writing infrastructure-as-code using Azure Resource Manager (ARM), Bicep, or Terraform.
Experience with CI/CD pipelines, release management, and automation in Azure DevOps.
In-depth understanding of cloud security practices, including identity management and network security in Azure.
Familiarity with monitoring, logging, and alerting best practices using Azure-native and third-party tools.
Demonstrated ability to work in Agile/Scrum teams and contribute to continuous integration and delivery cycles.
Experience with incident management, root cause analysis, and improving production environments.
Education: Any Graduation