SRE Architect

  • Marlborough, MA
  • Posted 5 days ago | Updated 5 days ago

Overview

On Site
Up to $120,000
Full Time

Skills

AppDynamics
Cloud Computing
Architectural Design
DevOps
Design Patterns
Continuous Improvement
Google Cloud Platform
Microsoft Azure
Software Architecture
grafana
datadog
splunk
prometheus

Job Details

SRE Architect
Location: Marlborough, MA
FULLTIME
Responsibilities:
1. Technical Expertise
  • Deep understanding of SRE principles, SRE model, and DevOps methodologies.
  • Experience designing highly available, scalable, and resilient distributed systems.
  • Proficient in architectural design (Microservices, Cloud-native, Event-driven architecture).
  • Skilled in cloud platforms: Azure, Google Cloud Platform.
  • Strong knowledge of observability tools: UIM, Prometheus, Grafana, Datadog, New Relic, Splunk, AppDynamics.
2. Framework Design & Governance
  • Define and validate SLOs, SLIs, SLAs, error budgets, and availability targets.
  • Design runbooks, escalation policies, and chaos testing frameworks.
  • Create reusable templates for observability, alerting, and logging.
  • Ensure compliance and audit readiness.
3. Communication & Cross-Functional Leadership
  • Collaborate with architects, designers, platform and infra teams.
  • Document frameworks and lead adoption across teams.
  • Review designs and validate reliability criteria.
Roles & Responsibilities:
1. Framework & Standardization
  • Define and maintain the SRE operating model, framework, and onboarding guide.
  • Create templates and reference architectures for observability, alerting, and runbooks.
  • Standardize definitions of availability, reliability, latency, and performance.
2. Architectural Integration
  • Participate in application architecture reviews to validate SRE compliance.
  • Recommend design patterns for fault tolerance, failover, auto-scaling, and DR.
  • Define observability-by-design principles.
3. Governance, Audit & Optimization
  • Establish and lead SRE councils or review boards.
  • Define SRE maturity models, scorecards, and compliance checks.
  • Perform SRE audits across product portfolios.
  • Guide teams on capacity modeling, load distribution, and cost-efficiency strategies.
  • Collaborate with platform teams on resource reservations and right-sizing.
4. Tool Rationalization & Strategy
  • Evaluate and recommend standard SRE toolchains for monitoring, logging, tracing.
  • Own the integration strategy across observability platforms.
5. Training, Leadership & Evangelism
  • Conduct SRE bootcamps for application and infra teams.
  • Champion a blameless culture and continuous improvement mindset.
  • Drive Error Budget policies and reliability trade-off discussions.
  • Mentor product teams on SRE integration strategies.
  • Influence architectural decisions with SRE perspectives.

Santosh Bathula

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.