Overview
Skills
Job Details
Job Title: Site Reliability Engineer
Location: Westlake, TX / Merrimack, NH (Hybrid)
Duration: Long Term Contract
Shift Details:
On-call: 10 am 8 pm EST (twice per week, one may fall on a weekend)
Non on-call: Monday Friday, 9:00 am 5:00 pm EST
Required Skills
* Datadog
* Kubernetes
* AWS (EKS preferred), Azure (AKS)
* On-call experience running incidents
* Development background with Ansible, Python, Node.js, JavaScript, Jenkins (Groovy scripting)
Role Overview
As a Site Reliability Engineer (SRE), you will be responsible for building, supporting, and scaling reliable and resilient distributed systems. This role combines software engineering and systems engineering to ensure high availability, automation, observability, and performance.
Key Responsibilities
* Design, build, and support highly distributed, multi-tiered systems at scale
* Drive automation using scripting and Infrastructure as Code (IaC) tools
* Implement CI/CD pipelines and DevOps best practices
* Manage Kubernetes clusters and containerized applications
* Apply observability practices including monitoring, alerting, and logging
* Troubleshoot incidents, perform root cause analysis, and ensure system resiliency
* Collaborate with cross-functional engineering teams
Qualifications
* Bachelor s degree in Computer Science, Engineering, or related field (Master s preferred)
* 8+ years of experience deploying and supporting distributed systems
* 2+ years of hands-on Cloud (AWS preferred) development and migration experience
* 2 4 years of experience in software development with Python, Node.js, or Java
* Strong Kubernetes administration and operations experience
* Expertise with monitoring/observability tools (Datadog, Prometheus, Grafana, Splunk, ELK, OpenTelemetry, etc.)
* Experience with Infrastructure as Code (Terraform, IAM, ARM, Chef, etc.)
* Strong troubleshooting, incident response, and communication skills
Nice to Have
* Chaos testing and resiliency engineering experience
* Experience supporting large-scale enterprise platforms
* Exposure to multiple cloud platforms (AWS & Azure)