Principal Site Reliability Engineer

Overview

On Site

Contract - W2

Contract - Independent

Contract - To 2026-10-30

100% Travel

Skills

Azure

Kubernetes

CI/CD

AWS

GCP

docker

Terraform

Prometheus

Grafana

Ansible

Job Details

Aspire IT Solutions is in a need of -

Position: Site Reliability Engineer
Location: Washington D.C. Area (Onsite - Only Locals)
Duration: 12 months contract

About the Role

We are seeking a Principal Site Reliability Engineer (SRE) to lead the operational excellence, resilience, and security of our client's core systems. This role combines deep technical expertise in infrastructure automation, CI/CD architecture, and cloud security with strong Site Reliability Engineering principles. You'll define SLOs, manage incident response, optimize cloud costs, and mentor teams to deliver secure, scalable, and highly available systems.

Key Responsibilities

Reliability Engineering & Operations

Define and maintain Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
Lead incident response, root cause analysis, and postmortem reviews to drive continuous improvement.
Implement and manage error budgets to balance reliability and innovation.

Infrastructure Automation

Design and manage secure, scalable, and automated environments using Terraform, Ansible, or CloudFormation.
Champion Infrastructure-as-Code (IaC) best practices for consistency and repeatability.

CI/CD Optimization & Security

Architect and enhance CI/CD pipelines (GitHub Actions, Jenkins) with advanced deployment methods - canary, blue/green, and automated rollback.
Integrate security gates (SAST, DAST, SBOM, secrets scanning) into the build and deployment lifecycle.

Observability & Telemetry

Build and maintain observability frameworks - dashboards, alerts, metrics, and tracing pipelines.
Use tools like Prometheus, Grafana, ELK, Datadog, and CloudWatch to ensure full visibility and proactive monitoring.

Cost & Capacity Management

Implement cost monitoring and right-sizing strategies to optimize cloud resources.
Plan capacity and availability in alignment with business goals.

Platform Enablement & Mentorship

Develop internal tools, playbooks, and self-service platforms to enhance developer efficiency.
Mentor cross-functional teams on SRE best practices, operational readiness, and secure delivery.

Qualifications

Education & Experience

Bachelor's degree in Computer Science, Engineering, or a related field.
5+ years in SRE, DevOps, or Platform Engineering, including technical leadership roles.
3+ years managing production-grade cloud environments with advanced security and observability practices.

Technical Skills

Expertise in AWS, Azure, or Google Cloud Platform, with strong knowledge of Compute, Networking, IAM, and monitoring.
Proficient with Terraform, CloudFormation, Kubernetes, and Docker.
Strong Linux administration and scripting (Bash, Python, or Go).
Hands-on experience with CI/CD, GitOps, and observability stacks.

Core Competencies

Deep understanding of SRE principles - SLOs, SLAs, incident management, chaos engineering, and capacity planning.
Strong communicator and collaborator with a passion for building reliable, secure, and efficient systems.
Proven ability to create and share operational tooling, documentation, and best practices across teams.

Thanks & Regards
Bhargav Kalyandurg (Find me on LinkedIn)
ASPIRE IT SOLUTIONS INC.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

About AspireIT Solutions

Share