Senior Site Reliability Engineer - AI/Automation Focus

Hybrid in Dallas, TX, US • Posted 3 hours ago • Updated 3 hours ago
Contract W2
No Travel Required
Hybrid
$95 - $105/hr
Fitment

Dice Job Match Score™

🔢 Crunching numbers...

Job Details

Skills

  • Artificial Intelligence
  • Ansible
  • Cloud Computing
  • Continuous Delivery
  • Java
  • GitHub
  • Health Care
  • FOCUS
  • Machine Learning (ML)
  • Python
  • Production Support
  • Budget
  • Communication
  • Dynatrace
  • Dashboard
  • Bash
  • Incident Management
  • IaaS
  • Management
  • Kubernetes
  • Microsoft Azure
  • Reliability Engineering
  • Root Cause Analysis
  • SAFE
  • Scripting
  • Workflow

Summary

Job Overview:

We are looking for a Senior Site Reliability Engineer (SRE) with strong experience in production support, cloud infrastructure, and automation. This role focuses on managing and improving highly available systems while gradually introducing AI-driven automation to streamline operations and incident response.

This is a senior-level role, ideal for candidates who can handle production environments independently and improve system reliability through automation.


Key Responsibilities:

< data-start=986 data-end=1030>1. Production Support & Reliability
  • Manage and support production systems in a cloud environment (Azure preferred)
  • Participate in on-call rotation and handle high-priority incidents
  • Perform root cause analysis and lead post-incident reviews
  • Monitor systems using dashboards, alerts, SLIs, and SLOs
  • Troubleshoot issues across Java applications, Kubernetes, and cloud infrastructure
  • Work with cross-functional teams to improve system stability

< data-start=1467 data-end=1512>2. Automation & AI-Driven Operations
  • Build automation solutions to reduce manual operational work
  • Develop AI-assisted workflows for incident detection, triage, and resolution
  • Create tools to analyze logs, metrics, and system alerts
  • Implement safe automation for tasks like restarts, scaling, and rollbacks
  • Generate automated incident reports and communication summaries

Required Skills:

  • 12+ years of experience in SRE / Production Support / DevOps
  • Strong experience with:
    • Azure Cloud
    • Kubernetes & Docker
    • Java-based applications
    • CI/CD (GitHub Actions or similar)
    • Monitoring tools (Dynatrace preferred)
  • Experience with scripting/automation (Python, Bash, Ansible)
  • Solid understanding of SRE concepts (SLI, SLO, error budgets)

Preferred Skills:

  • Experience in automation of production workflows
  • Exposure to AI/ML or AI-based automation tools
  • Experience with multi-system or distributed environments
  • Background in regulated industries (e.g., healthcare) is a plus

What We’re Looking For:

  • Strong production support experience
  • Ability to handle incidents independently
  • Experience working in fast-paced environments
  • Willingness to be part of on-call rotation
  • Comfortable supporting multiple time zones
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 90999382
  • Position Id: 8955811
  • Posted 3 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Irving, Texas

4d ago

Easy Apply

Contract, Third Party

Depends on Experience

Hybrid in Irving, Texas

29d ago

Easy Apply

Contract

55 - 65

Dallas, Texas

12d ago

Easy Apply

Contract

50

Hybrid in Dallas, Texas

15d ago

Easy Apply

Contract

$70 - $80

Search all similar jobs