Hiring: SRE Architect Lead AIOps & Dynatrace
Location: Atlanta, GA (Local to GA Candidates only)
Work Mode: Hybrid
We are looking for a highly skilled SRE Architect Lead with strong experience in AIOps, Observability, and Enterprise Reliability Engineering to join a fast-paced enterprise environment.
Key Responsibilities:
Lead SRE strategy, architecture, and reliability initiatives across large-scale distributed systems
Design and implement AIOps-driven monitoring and incident management solutions
Build proactive observability frameworks using Dynatrace and related monitoring platforms
Drive automation, self-healing, root cause analysis, and performance optimization initiatives
Collaborate with DevOps, Cloud, Platform Engineering, and Application teams
Improve system availability, scalability, resiliency, and operational excellence
Define SLOs, SLIs, SLAs, reliability metrics, and operational best practices
Lead production incident management, problem management, and postmortem processes
Mentor engineering teams on SRE practices and operational maturity
Required Skills:
Strong experience in Site Reliability Engineering (SRE) Architecture & Leadership
Hands-on expertise with Dynatrace (Monitoring, APM, Observability, Dashboarding, Alerting)
Experience with AIOps platforms, event correlation, anomaly detection, and automation
Strong cloud experience with AWS / Azure / Google Cloud Platform
Expertise in Kubernetes, Docker, OpenShift, or containerized environments
Experience with CI/CD pipelines and Infrastructure Automation
Scripting/Programming experience in Python, Bash, or Go
Knowledge of Incident Management, RCA, Capacity Planning, and Reliability Engineering
Experience supporting enterprise-scale production environments
Nice to Have:
Experience with ServiceNow, Splunk, Grafana, Prometheus, ELK, or Moogsoft
Exposure to ML-driven observability or predictive analytics
DevSecOps and cloud-native architecture experience