Apply Now

Azure Cloud Site Reliability Engineer (SRE)

Calgary, AB, CA • Posted 9 hours ago • Updated 9 hours ago

Contract W2

Contract Independent

12 Months

On-site

$50 - $55/hr

Fitment

Dice Job Match Score™

🫥 Flibbertigibetting...

Job Details

Skills

SRE
Python
Ansible
Artificial Intelligence
Capacity Management
Kubernetes
DevOps
Incident Management
Continuous Integration
Continuous Delivery
Scalability
PaaS
Terraform
Workflow
Root Cause Analysis
Linux
MEAN Stack
Cloud Computing
Amazon Web Services
Windows PowerShell
Reliability Engineering
ARM
IaaS
Microsoft Azure
Scripting

Summary

Job Title: Azure Cloud Site Reliability Engineer (SRE)

Job Title: -Calgary ,AB (Onsite )

Role Overview: The SRE will be responsible for the reliability, availability, and performance of Azure/AWS PaaS and IaaS workloads. They bridge the gap between development and operations, focusing on building automated systems that prevent failures, managing incident responses, and optimizing cloud costs.

Key Responsibilities

System Reliability & Monitoring: Design, implement, and maintain comprehensive monitoring and alerting systems such as Azure Monitor, AWS CloudWatch, Application Insights, and Log Analytics.

Automation & Toil Reduction: Automate repetitive manual operations (toil) such as environment provisioning, system patching, and scaling. Use IaC tools like Terraform and Ansible to manage infrastructure.

Incident Response & Management: Actively manage incident responses, root cause analysis (RCA), and post-mortem investigations to improve system reliability and minimize mean time to resolution (MTTR).

Cloud SRE Agent Integration: Deploy and configure Cloud SRE Agent to automate incident investigation, execute remediation steps (restart, scale, rollback), and manage routine tasks.

Capacity Planning & Scalability: Analyze usage patterns to optimize cloud resources, ensuring high availability and performance while managing costs via Azure Cost Management.

CI/CD & DevOps Collaboration: Integrate automation workflows into CI/CD pipelines (e.g., GitHub Actions or Azure Pipelines) to ensure reliable deployments.

Required Skills & Qualifications

Cloud Platforms: Expert knowledge of Microsoft Azure infrastructure services (Compute, Storage, Networking, AKS).

Scripting & Programming: Proficiency in Python, Bash, or PowerShell for building automation tools.

Infrastructure as Code (IaC): Extensive experience with Terraform and ARM templates/Bicep.

Observability Tools: Experience with Azure Monitor, Grafana, Prometheus, or Datadog.

Containers & Orchestration: Solid understanding of Kubernetes/AKS (Azure Kubernetes Service).

Operating Systems: Proficient in Windows/Linux environments.

Azure Certification is a +

Exposure to multi Cloud environment is must.

Typical "Day in the Life" Activities

1. Reviewing Service Level Objectives (SLOs) and error budgets.

2. Refining auto-scaling rules for Kubernetes clusters based on traffic trends.

3. Working with developers to review service architecture and ensure fault tolerance.

4. Configuring AI-driven alert suppression to reduce alert fatigue.

5. Creating Azure Dashboards to visualize key performance indicators (KPIs).

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 91141183
Position Id: 9000494
Posted 9 hours ago

Contact the job poster

amit gandhi

Recruiter @ Della Solutions and Services Inc.

View Profile

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

It looks like there aren't any Similar Jobs for this job yet.

Search all similar jobs