Site Reliability Engineer

Hybrid in Irving, TX, US • Posted 1 day ago • Updated 1 day ago
Contract W2
No Travel Required
On-site
$55 - $65/yr
Fitment

Dice Job Match Score™

👾 Reticulating splines...

Job Details

Skills

  • API
  • Ansible
  • Backup
  • AppDynamics
  • Business Continuity Planning
  • Cloud Computing
  • Continuous Delivery
  • Computer Networking
  • Budget
  • Bash
  • DevOps
  • Continuous Improvement
  • Dashboard
  • Documentation
  • DevSecOps
  • GitHub
  • Continuous Integration
  • Incident Management
  • Google Cloud Platform
  • Jenkins
  • Good Clinical Practice
  • Kubernetes
  • Leadership
  • Grafana
  • MEAN Stack
  • Migration
  • Microsoft Azure
  • Linux
  • Onboarding
  • Orchestration
  • Provisioning
  • Python
  • Reporting
  • Root Cause Analysis
  • Scripting
  • Splunk
  • Terraform
  • Workflow

Summary

Required Qualifications: MUST HAVE

Platform Ownership & Reliability (SRE):

  • Support endtoend reliability, availability, and performance of the Harness CD platform across nonprod, prod, and BCP environments
  • Maintain and report on SLIs, SLOs, error budgets, deployment success rates, and platform health metrics
  • Lead incident response, troubleshooting, and RCA for deployment failures, delegate outages, or platform performance issues
  • Identify and remediate scaling, performance, and capacity constraints across delegates, pipelines, Kubernetes clusters, and cloud integrations

Automation & Engineering Excellence:

  • Develop automation for provisioning, configuration, scaling, upgrades, and maintenance of Harness components
  • Build Infrastructure as Code (IaC) using Terraform, Ansible, Helm, or equivalent tools
  • Automate common operational tasks including delegate lifecycle, cluster onboarding, secret rotation, and pipeline validation
  • Reduce manual work by implementing resilient, repeatable, and selfservice automation workflows

DevOps & CI/CD Integration:

  • Maintain and enhance Harness integrations with GitHub, Jenkins, Azure DevOps, Kubernetes/OpenShift clusters, and cloud providers
  • Ensure an efficient developer experience through welloptimized pipelines and reliable deployment mechanisms
  • Partner with DevOps teams to optimize orchestration strategies (blue/green, canary, rolling)
  • Work with Security teams to embed DevSecOps controls such as policy enforcement, governance pipelines, and security checks 

Observability & Monitoring:

  • Implement and maintain monitoring, logging, dashboards, and alerting for all Harness components
  • Use Splunk, Prometheus, Grafana, AppDynamics, or similar tools to build actionable alerts
  • Detect and escalate issues such as delegate saturation, pipeline slowdowns, API failures, and Kubernetes resource constraints
  • Support proactive monitoring to reduce mean time to detection and resolution 

Modernization & Continuous Improvement:

  • Assist with Harness upgrades, hotfixes, patching, and vendorrecommended lifecycle activities
  • Contribute to modernization efforts including containerization, cloudnative deployments, and multicloud expansion
  • Support resiliency improvements such as BCP validation, backup verification, and BCP readiness
  • Evaluate new Harness features, modules on platform capabilities for enterprise usage

Technical Leadership:

  • Act as a technical SME for Harness platform operations and enhancements
  • Provide platform guidance, documentation, architecture details, and runbook development
  • Partner with senior engineers to improve standards, automation patterns, and operational excellence

Required Qualifications

  • Core Technical Skills:
  • 5–7+ years of experience in DevOps, SRE, Platform Engineering, or Cloud Engineering roles
  • Handson experience with Harness CD
  • Strong experience with Kubernetes/OpenShift, Linux, cloud services and deployment best practices
  • Solid understanding of CI/CD workflows and software release automation

SRE & Automation:

  • Experience applying SRE concepts such as SLIs/SLOs, error budgets, and operational maturity improvements
  • Strong automation/scripting skills using Python, Bash, or PowerShell
  • Infrastructure as Code experience with Terraform, Ansible, Helm, or equivalent tooling

Observability & Troubleshooting:

  • Experience with observability tools (Prometheus, Grafana, Splunk, ELK, AppDynamics, etc.)
  • Strong troubleshooting skills across container, OS, networking, platform, and cloud technology layers

Preferred Qualifications:

  • Experience supporting CD platforms at enterprise scale (hundreds of teams, multiregion deployments)
  • Experience in cloudnative and hybrid cloud environments (Azure, Google Cloud Platform)
  • Familiarity with DevSecOps practices, policy automation frameworks, and governance models
  • Experience supporting complex upgrades, platform migrations, or modernization projects
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10114908
  • Position Id: 8926643
  • Posted 1 day ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Irving, Texas

Today

Easy Apply

Contract

USD0 - USD0

Hybrid in Dallas, Texas

13d ago

Easy Apply

Full-time

$140,000 - $155,000

Hybrid in Irving, Texas

26d ago

Easy Apply

Full-time

$83,912 - $128,080

Westlake, Texas

3d ago

Easy Apply

Contract

Search all similar jobs