Technology Lead | DevOps | Continuous Delivery Environment Management & Provisioning
Location: Chicago, IL (60604)
Duration: Long Term
Experience: 8 10 Years
Job Summary
We are seeking an experienced Technology Lead DevOps & Site Reliability Engineering (SRE) to support production environments and cloud-native platforms. The ideal candidate will have strong expertise in CI/CD automation, Infrastructure as Code (IaC), Azure cloud services, Kubernetes, monitoring/observability, and production support. This role will drive environment provisioning, deployment automation, reliability engineering, and operational excellence for enterprise-scale applications.
Required Skills
DevOps & CI/CD
8+ years of experience designing and managing CI/CD pipelines
Strong expertise in deployment automation, release management, and environment provisioning
Experience with DevOps tools and continuous delivery practices
Infrastructure as Code (IaC)
Azure Cloud
Containerization & Kubernetes
Monitoring & Observability
8+ years of experience with monitoring and observability tools
3+ years of hands-on experience with:
Experience in performance monitoring, alerting, and root cause analysis
Preferred Skills
Experience with cloud-native application architectures
Knowledge of:
Understanding of SRE principles and reliability engineering best practices
Experience supporting large-scale production environments
Key Responsibilities
Production Reliability & Incident Management
Lead troubleshooting, analysis, and resolution of production issues.
Investigate unexpected system behaviors impacting service quality and availability.
Perform root cause analysis and implement preventive measures.
Performance Monitoring & Optimization
Gather, analyze, and interpret operational metrics.
Support performance tuning, capacity planning, and fault identification.
Build dashboards and monitoring solutions to improve visibility.
Automation & Platform Engineering
Automate infrastructure provisioning and operational tasks.
Manage CI/CD pipelines and deployment processes.
Ensure high availability, scalability, and reliability of cloud environments.
Proactively identify risks and implement solutions before incidents occur.