Role: SRE / DevOps Engineers - Helix
Location: Remote
Contract: C2C
Exp: 10+ Years
Role Summary
We are seeking three SRE / DevOps Engineers to improve the reliability, observability, and operational readiness of business-critical platforms and services supporting the Helix program. While titled as SRE / DevOps, these roles are heavily operations-oriented and require strong production support, incident
response, and Splunk-based monitoring experience.
Key Responsibilities
Lead complex initiatives to improve the reliability, availability, and operational readiness of business-critical platforms and services.
Own and support production operations, including implementation support, system health
monitoring, and proactive issue identification.
Play a key role in Incident Management, including triage, coordination, root cause analysis, and driving post-incident remediation.
Support and participate in Business Continuity Planning activities, including failover readiness, disaster recovery testing, and recovery validation.
Design, implement, and maintain monitoring, alerting, and observability solutions, with a strong emphasis on Splunk-based logging and dashboards.
Automate operational workflows to reduce manual effort and improve mean time to detect and mean time to recover.
Partner with application, platform, and security teams to ensure services are built and deployed with operational excellence and reliability in mind.
Define and enforce SRE and DevOps standards, including SLIs/SLOs, alert hygiene, runbooks, and on-call best practices.
Lead and participate in post-incident reviews, ensuring root causes are addressed and preventive actions are implemented.
Mentor engineers on reliability engineering, incident response, and operational best practices.
Continuously evaluate and improve system performance, resiliency, and operational tooling across the platform lifecycle.
Additional Role Context
These roles are more operations-heavy than a traditional engineering-focused SRE title may suggest.
Strong Splunk experience is required, including dashboard creation, query development, log investigation, and trace-based troubleshooting across connected systems.
The team needs people who can navigate issues across integrated systems involved in the Helix VM process, including front-end and infrastructure-connected services.
The role supports a high-volume change environment, including multiple CR implementations in a single evening and operational coordination across workstreams.
Candidates should be informed up front that the role may require after-hours deployments, night support, and possible weekend work tied to CRs and future BCP events. CR activity typically begins around 9 PM ET.
Dallas is the preferred location for these operations resources to support onboarding and collaboration with the existing local team, though strong candidates outside Dallas may still be considered.
Engagement is expected through the end of the year, based on project demand.
Project Context: Helix
Helix is an internal platform initiative focused on enabling self-service infrastructure provisioning for
application development teams. The program works with platform teams to expose APIs and automation
for provisioning services such as VMs, storage, and related infrastructure resources. The goal is to reduce
manual operational processes, improve governance, and support migration away from heritage
environments through a strategic internal developer platform