Overview
Hybrid
Depends on Experience
Contract - W2
Contract - 12 Month(s)
50% Travel
Skills
Amazon Web Services
Analytical Skill
Ansible
Bridging
C
C++
Cloud Computing
Collaboration
Communication
Computer Networking
Conflict Resolution
Continuous Delivery
Continuous Integration
Data Link Layer
DevOps
Development Testing
Docker
Finance
GitLab
Good Clinical Practice
Google Cloud
Google Cloud Platform
Grafana
High Availability
IaaS
Incident Management
Java
Jenkins
KPI
Kubernetes
Linux
Load Balancing
Management
Microsoft Azure
New Relic
Operational Excellence
Orchestration
Physical Layer
Problem Solving
Provisioning
Python
Reliability Engineering
Root Cause Analysis
Ruby
Scalability
Scripting
Software Engineering
Supervision
System Administration
Terraform
Unix
Workflow
Job Details
Job Title: Senior DevOps / Site Reliability Engineer (SRE)
Location: Dallas, TX (Hybrid) | Ellicott City, MD (Hybrid)
Experience: 10+ Years
Employment Type: Contract
Job Summary
We are seeking a highly experienced Senior DevOps / Site Reliability Engineer (SRE) with 10+ years of experience to join our team. The ideal candidate will have a strong background in automation, cloud infrastructure, CI/CD pipelines, and observability. This role bridges software engineering and systems reliability, ensuring performance, scalability, and operational excellence across enterprise platforms.
Key Responsibilities
- Develop and maintain scripts, triggers, and workflow automations to streamline deployment, monitoring, and incident response processes.
- Design and implement observability frameworks; define KPIs, metrics, and alerts to proactively identify and resolve performance issues.
- Lead the resolution of critical incidents beyond L1/L2 support, drive root cause analysis (RCA), and implement preventive measures.
- Use tools such as Terraform, Ansible, or CloudFormation to automate infrastructure provisioning and management.
- Ensure high availability and resilience across AWS, Azure, or Google Cloud Platform environments; optimize cost, performance, and scalability.
- Partner with Architecture, Development, QA, and Operations teams to deliver robust, scalable, and reliable solutions.
- Implement shift-left practices, promote automation-first culture, and enhance CI/CD pipelines for faster, safer deployments.
Technical Skills
- Programming: Proficiency in Python, Java, Go, C/C++, or Ruby; experience with IaC languages such as Terraform and Ansible.
- Cloud Platforms: Hands-on expertise in AWS, Azure, or Google Cloud Platform (Google Cloud Platform).
- Containerization & Orchestration: Strong experience with Docker, Kubernetes, and related ecosystem tools (Helm, Istio, etc.).
- CI/CD: Experience with tools such as Jenkins, GitLab CI/CD, Harness, or Spinnaker.
- System Administration & Networking: Deep understanding of Linux/Unix systems, networking, load balancing, and security best practices.
- Monitoring Tools: Experience with Prometheus, Grafana, ELK Stack, Datadog, or New Relic.
Soft Skills
- Excellent problem-solving and analytical thinking.
- Strong communication and collaboration across cross-functional teams.
- Proactive, self-motivated, and able to handle complex technical challenges with minimal supervision.
Preferred Qualifications
- Certifications in AWS/Azure/Google Cloud Platform or Kubernetes (CKA/CKAD) are a plus.
- Prior experience in large-scale enterprise or financial environments preferred.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.