Principal Site Reliability Engineer

Overview

On Site
Contract - W2
Contract - Independent
Contract - To 2026-10-30
100% Travel

Skills

Azure
Kubernetes
CI/CD
AWS
GCP
docker
Terraform
Prometheus
Grafana
Ansible

Job Details

Aspire IT Solutions is in a need of -
Position: Site Reliability Engineer
Location: Washington D.C. Area (Onsite - Only Locals)
Duration: 12 months contract

About the Role

We are seeking a Principal Site Reliability Engineer (SRE) to lead the operational excellence, resilience, and security of our client's core systems. This role combines deep technical expertise in infrastructure automation, CI/CD architecture, and cloud security with strong Site Reliability Engineering principles. You'll define SLOs, manage incident response, optimize cloud costs, and mentor teams to deliver secure, scalable, and highly available systems.

Key Responsibilities

Reliability Engineering & Operations

  • Define and maintain Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
  • Lead incident response, root cause analysis, and postmortem reviews to drive continuous improvement.
  • Implement and manage error budgets to balance reliability and innovation.

Infrastructure Automation

  • Design and manage secure, scalable, and automated environments using Terraform, Ansible, or CloudFormation.
  • Champion Infrastructure-as-Code (IaC) best practices for consistency and repeatability.

CI/CD Optimization & Security

  • Architect and enhance CI/CD pipelines (GitHub Actions, Jenkins) with advanced deployment methods - canary, blue/green, and automated rollback.
  • Integrate security gates (SAST, DAST, SBOM, secrets scanning) into the build and deployment lifecycle.

Observability & Telemetry

  • Build and maintain observability frameworks - dashboards, alerts, metrics, and tracing pipelines.
  • Use tools like Prometheus, Grafana, ELK, Datadog, and CloudWatch to ensure full visibility and proactive monitoring.

Cost & Capacity Management

  • Implement cost monitoring and right-sizing strategies to optimize cloud resources.
  • Plan capacity and availability in alignment with business goals.

Platform Enablement & Mentorship

  • Develop internal tools, playbooks, and self-service platforms to enhance developer efficiency.
  • Mentor cross-functional teams on SRE best practices, operational readiness, and secure delivery.

Qualifications

Education & Experience

  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • 5+ years in SRE, DevOps, or Platform Engineering, including technical leadership roles.
  • 3+ years managing production-grade cloud environments with advanced security and observability practices.

Technical Skills

  • Expertise in AWS, Azure, or Google Cloud Platform, with strong knowledge of Compute, Networking, IAM, and monitoring.
  • Proficient with Terraform, CloudFormation, Kubernetes, and Docker.
  • Strong Linux administration and scripting (Bash, Python, or Go).
  • Hands-on experience with CI/CD, GitOps, and observability stacks.

Core Competencies

  • Deep understanding of SRE principles - SLOs, SLAs, incident management, chaos engineering, and capacity planning.
  • Strong communicator and collaborator with a passion for building reliable, secure, and efficient systems.
  • Proven ability to create and share operational tooling, documentation, and best practices across teams.

Thanks & Regards
Bhargav Kalyandurg (Find me on LinkedIn)
ASPIRE IT SOLUTIONS INC
.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About AspireIT Solutions