Overview
On Site
Accepts corp to corp applications
Contract - W2
Contract - 1 day((s))
Skills
Sr. DevOps/SRE Lead Engineer
Job Details
Role: Sr. DevOps/SRE Lead Engineer
Location: Washington DC
Duration: Long term
Experience: 12+ Years
Rate: $60hr. C2C
Client: RailRoad Transportation experience candidates are highly preferred
Mandatory Skills:
- Reliability Engineering: Define and maintain service-level objectives (SLOs), implement error budgeting, and lead incident response and postmortem analysis.
- Infrastructure Automation: Use Terraform, Ansible, and other IaC tools to create secure, scalable, and repeatable environments.
- CI/CD Optimization: Architect secure and efficient pipelines (e.g., GitHub Actions, Jenkins), incorporating automated rollback, canary/blue-green deploys, and artifact validation.
- Observability: Build dashboards, alerts, synthetic checks, and telemetry pipelines that ensure visibility into system performance, availability, and cost.
- Security & Compliance: Integrate security tooling (SAST, DAST, SBOM, secrets scanning) and enforce policy-as-code in deployment workflows.
- Cost & Capacity Planning: Implement tooling and practices to monitor cloud cost trends, right-size infrastructure, and ensure high availability at optimal spend.
- Internal Enablement: Develop reusable internal tools, shared playbooks, and self-service platforms that boost developer productivity and ensure consistent delivery.
- Mentorship & Leadership: Serve as a technical mentor across platform, security, and engineering teams. Establish best practices in operational readiness, fault tolerance, and secure delivery
- Bachelor's degree in Computer Science, Engineering, or related technical discipline.
- At least 5 years of experience in DevOps, SRE, or Platform Engineering roles with leadership experience in automation and infrastructure reliability.
- 3+ years hands-on experience in high-availability production environments with cloud-native security and observability tooling.
- Deep expertise in AWS (or equivalent cloud platform), especially in compute, networking, IAM, and monitoring.
- Proficiency in Terraform, CloudFormation, Kubernetes, Docker, and Linux systems.
- Strong knowledge of observability stacks (Prometheus, Grafana, ELK, Datadog, CloudWatch).
- Experience implementing and managing CI/CD systems with security tollgates and rollback logic.
- Strong scripting skills in Python, Go, or Bash for automation and tooling.
- In-depth understanding of SRE practices including incident response, SLO/SLA management, chaos engineering, and capacity modeling.
- Familiarity with Git and GitOps patterns.
- Proven track record of creating shared tooling and documentation that promotes operational excellence.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.