Job Title: Lead Site Reliability Engineer
Location: Memphis, TN
Duration: 6+ Months
Skills: Site Reliability Engineer Engagement Manager having expertise in DevOps/Terraform/Cloud, IT transformation methodologies to improve system reliability, scalability, and performance. Knowledge on Dynatrace, Pagerduty and driven AI lead initiatives bring automation and reduce efforts. Atleast led team of 40+ in Onsite/Offshore model
Role Overview
We are seeking a highly experienced Site Reliability Engineering (SRE) Engagement Manager with deep expertise in DevOps, Terraform, and Cloud technologies, along with strong leadership capabilities in managing large onsite/offshore teams.
This role will drive enterprise-scale IT transformation initiatives focused on improving system reliability, scalability, performance, and automation maturity. The ideal candidate will have hands-on experience with observability and incident management platforms such as Dynatrace and PagerDuty, and a strong track record of leading AI-driven automation initiatives to reduce operational effort and improve service quality.
Key Responsibilities
Leadership & Engagement Management
- Lead and manage a 40+ member global SRE team across onsite and offshore models.
- Own end-to-end service delivery, operational excellence, and stakeholder management.
- Drive engagement governance including KPIs, SLAs, SLOs, and error budgets.
- Collaborate with executive leadership and business stakeholders on transformation roadmaps.
Site Reliability & DevOps Strategy
- Define and implement SRE best practices aligned with business objectives.
- Improve system reliability, availability, and scalability across cloud environments.
- Establish automation-first culture and reduce manual intervention.
- Implement capacity planning and performance engineering strategies.
Cloud & Infrastructure as Code
- Architect and manage cloud-native infrastructure (AWS/Azure/Google Cloud Platform).
- Lead Infrastructure as Code (IaC) initiatives using Terraform.
- Ensure compliance, security, and resilience in cloud environments.
- Drive CI/CD pipeline optimization and DevOps maturity improvements.
Observability & Incident Management
- Implement and optimize monitoring and observability frameworks using Dynatrace.
- Manage incident response processes leveraging PagerDuty.
- Drive RCA culture and continuous improvement programs.
- Enhance proactive detection and reduce MTTR.
AI-Driven Automation & Transformation
- Lead AI-enabled automation initiatives to reduce operational overhead.
- Implement predictive monitoring, anomaly detection, and intelligent alerting.
- Identify opportunities for process automation and self-healing systems.
- Drive measurable efficiency improvements and cost optimization.
-
Required Qualifications
- 12+ years of experience in IT operations, DevOps, or SRE roles.
- 5+ years of experience managing large-scale global teams (40+ members).
- Strong expertise in:
- DevOps practices and CI/CD
- Cloud platforms (AWS, Azure, or Google Cloud Platform)
- Terraform (Infrastructure as Code)
- Dynatrace (Observability)
- PagerDuty (Incident Management)
- Experience leading IT transformation and cloud modernization programs.
- Strong stakeholder communication and executive presentation skills.
- Proven experience working in onsite/offshore delivery models.
- Preferred Qualifications
- Certifications in Cloud (AWS/Azure/Google Cloud Platform), DevOps, or SRE.
- Experience implementing AIOps solutions.
- Background in performance engineering and resilience architecture.
- ITIL or Agile/Scrum certifications.
With Regards,
Dinesh B
CS Solutions, Inc
7525 Mitchell Road, Suite 106, Eden Prairie, MN 55344
Email ID -
Direct Number