SRE Architect

Overview

On Site
Part Time
Accepts corp to corp applications
Contract - Independent
Contract - W2
Contract - 12th Month(s)

Skills

SRE Architect

Job Details

Job Title: SRE Architect

Location: Atlanta GA
Job Type: - Contract
On-Site

Job Description:-
Key Responsibilities: Architecture & Reliability Design
  • Define and implement the SRE architecture, reliability framework, and operational strategy.

  • Design scalable, fault-tolerant systems for high availability and disaster recovery.

  • Establish SLOs, SLIs, and SLAs across services and ensure compliance.

  • Architect systems for observability: logging, tracing, metrics, and alerting.

Automation & Engineering
  • Drive automation for infrastructure, deployment, and monitoring using tools like:

    • Terraform, Ansible, Helm, Kubernetes Operators

    • CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI, ArgoCD)

  • Automate manual processes to improve efficiency and reduce MTTR.

  • Develop self-healing mechanisms and automated remediation workflows.

Cloud & Infrastructure
  • Lead cloud architecture design on AWS, Azure, or Google Cloud Platform.

  • Architect and optimize Kubernetes clusters and containerized applications.

  • Implement and manage scaling strategies, load balancing, and failover designs.

  • Oversee network reliability, security, and configuration management.

Monitoring, Performance & Incident Management
  • Implement observability tools like:

    • Prometheus, Grafana, ELK, Datadog, New Relic, Splunk

  • Lead incident management processes by identifying root causes and improving system resilience.

  • Conduct performance testing, capacity planning, and SLA compliance reporting.

Collaboration & Leadership
  • Partner with software engineering, DevOps, security, and product teams.

  • Mentor the SRE team and promote reliability engineering best practices.

  • Establish playbooks, runbooks, and operational documentation.

Required Skills & Qualifications:
  • 8 12+ years of experience in SRE, DevOps, or infrastructure engineering.

  • Strong hands-on experience with:

    • Kubernetes & Docker

    • Cloud platforms (AWS/Azure/Google Cloud Platform)

    • IaC tools (Terraform/CloudFormation)

    • CI/CD systems

  • Deep understanding of:

    • Distributed systems

    • Reliability and performance engineering

    • Observability tools

    • Incident & problem management

  • Experience in scripting/programming (Python, Go, Shell, etc.).

  • Strong troubleshooting, analytical, and architectural skills.


Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.