Overview
Remote
Depends on Experience
Full Time
Skills
Site Reliability Engineer
SRE
Devops
Datadog
Azure
scripting and programming languages (e.g.
Python
PowerShell
Shell scripting).
Job Details
Position: Site Reliability Engineer
100% Remote
Hiring: Contract / Fulltime
Overview:
SRE with deep expertise in Azure cloud-native observability implementation, Datadog, and production incident management. Also, possess strong Azure cloud-native tech stack performance tuning skills to optimize system reliability, scalability, and efficiency.
Required Skills and Qualifications:
- Strong experience with Azure cloud-native observability tools (e.g., Azure Monitor, Log Analytics, Application Insights).
- Hands-on expertise with Datadog for advanced monitoring and observability.
- Strong expertise in implementing SRE principles and best practices
- Expertise in defining and tracking SLO / SLI / Error Budget dashboards
- Proficiency in managing production systems in AWS & Azure cloud environments.
- Deep understanding of Azure cloud performance tuning and optimization techniques.
- Proven track record of leading production incident response and resolution.
- Ability to perform root cause analysis and implement corrective actions.
- Proficiency in scripting and programming languages (e.g., Python, PowerShell, Shell scripting).
- Experience with infrastructure-as-code tools (Terraform)
- Strong experience in performance & scalability optimization activities on cloud-native technologies.
- Knowledge of CI/CD pipelines and DevOps practices.
Soft Skills:
- Strong analytical and problem-solving skills.
- Excellent communication and collaboration abilities.
- Ability to work effectively in a fast-paced, dynamic environment.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.