Overview
On Site
$60 - $70
Contract - Independent
Contract - W2
Skills
SRE Program Architect
Job Details
Role: SRE Program Architect
Location: Dallas, TX (Hybrid)
Duration: Long Term
About the job
Job Summary:
The SRE Program Architect plays a critical role in the observability and reliability engineering program.
This role ensures enterprise-grade execution of observability initiatives, with responsibilities covering
governance, design, implementation, and optimization of relevant tools and practices. The SRE Program
Architect operates across hybrid/multi-cloud environments, integrates with ITSM, and ensures alignment
to SLO-driven service outcomes. This position requires both strong technical experience and enterprise
delivery expertise.
Responsibilities:
- Define and execute standards, frameworks, and playbooks aligned to observability objectives.
- Collaborate with cross-functional teams (DevOps, SRE, application owners, infra teams) to ensure adoption.
- Ensure data, metrics, logs, traces, and events converge into actionable insights.
- Integrate tooling (Dynatrace, LogicMonitor, ELK, ServiceNow) into CI/CD and operational workflows.
- Build and maintain dashboards, KPIs, and reporting packs to support stakeholders at all levels.
- Support regulatory compliance, risk management, and audit readiness through observability practices.
- Mentor team members and contribute to knowledge sharing and process maturity.
Required Skills
- 8 12 years relevant experience in enterprise-scale IT, monitoring, or observability programs.
- Proven expertise in key observability platforms (Dynatrace, LogicMonitor, ELK, ServiceNow).
- Strong experience in hybrid/multi-cloud environments (AWS, Azure, Google Cloud Platform, VMware).
- Hands-on automation/IaC (Terraform, Ansible, GitOps, YAML).
- Excellent understanding of ITIL and SRE practices (SLIs, SLOs, error budgets).
- Ability to work with globally distributed teams and manage stakeholder expectations.
- Strong problem-solving, communication, and leadership skills.
Preferred Skills
- Exposure to OpenTelemetry, Prometheus, Grafana, and modern observability stacks.
- Familiarity with Dynatrace Grail/DQL and AI-based anomaly detection.
- Knowledge of cost optimization and FinOps practices in observability platforms.
- Industry certifications in observability, SRE, ITIL, or cloud (AWS/Azure/Google Cloud Platform).
- Experience in regulated industries (finance, healthcare, public sector).
Tool Priorities
- Monitoring/APM: Dynatrace (APM, RUM, Synthetics, Monaco/YAML), LogicMonitor.
- Logging: ELK stack (Elasticsearch, Logstash, Kibana, Beats).
- Automation/IaC: Terraform, Ansible, GitOps pipelines, YAML configs.
- ITSM: ServiceNow (Event Mgmt., CMDB, Incident/Problem flows).
- Analytics/Reporting: Power BI, Grafana, QBR dashboards.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.