Site Reliability Engineer/SRE

Hybrid in Austin, TX, US • Posted 17 hours ago • Updated 16 hours ago
Contract Corp To Corp
Contract W2
Contract Independent
No Travel Required
On-site
Depends on Experience
Fitment

Dice Job Match Score™

🫥 Flibbertigibetting...

Job Details

Skills

  • Site Reliability Engineer
  • DevOps
  • Linux
  • Unix
  • Python
  • Go
  • Java
  • Bash
  • AWS
  • GCP
  • Docker
  • Kubernetes
  • SLIs
  • SLOs
  • RCA
  • Prometheus
  • Grafana
  • Datadog
  • Splunk
  • Reliability Engineering
  • Root Cause Analysis
  • Scheduling
  • Scripting
  • Incident Management
  • Management
  • Orchestration
  • Regulatory Compliance
  • Scalability
  • Dashboard
  • Documentation
  • Forms
  • Good Clinical Practice
  • Google Cloud Platform
  • Amazon Web Services
  • Budget
  • CHAOS
  • Cloud Computing
  • Workflow
  • Cost-benefit Analysis
  • Service Level
  • Software Engineering
  • Systems Engineering
  • Testing

Summary

Job ID: TX-529601671

Hybrid/Local TX Govt Site Reliability Engineer/SRE (15+) with DevOps/System Engineering, Linux/Unix, Python/Go/Java/Bash, AWS/Google Cloud Platform, Docker/Kubernetes, SLIs/SLOs, PrometheGrafana/Datadog/Splunk/Application Insights experience

Location: Austin, TX (HHSC)
Duration: 3 Months
Position will be 3 days remote with 2 days (Mondays and Thursdays) required to be onsite at the location listed above. Program will only accept LOCAL ONLY candidates for this position.

Skills:
8     Required    experience in systems engineering, DevOps, or site reliability engineering roles
8     Required    Strong experience with Linux/Unix systems and system internals
8     Required    Proficiency in one or more programming/scripting languages (Python, Go, Java, Bash)
8     Required    Experience designing and operating highly available, distributed systems
8     Required    Strong knowledge of cloud platforms (AWS, or Google Cloud Platform) and cloud-native services
8     Required    Experience with containerization and orchestration (Docker, Kubernetes)
8     Required    Strong understanding of monitoring, alerting, and logging concepts
8     Required    Experience defining and managing SLIs, SLOs, and error budgets
8     Required    Familiarity with incident management, root cause analysis (RCA), and postmortems
8     Required    Experience integrating security and compliance into operational workflows
4     Preferred   Familiarity with observability tools (Prometheus, Grafana, Application Insights, Datadog, Splunk)
4     Preferred   Experience operating 24×7 production environments with on-call rotations
4     Preferred   Experience with chaos engineering and resiliency testing
4     Preferred   Experience with feature flags, canary deployments, and progressive delivery
4     Preferred   Strong documentation skills for runbooks, dashboards, and operational standards

Description:
8 or more years of experience, relies on experience and judgment to plan and accomplish goals, independently performs a variety of complicated tasks, a wide degree of creativity and latitude is expected.

Understands business objectives and problems, identifies alternative solutions, performs studies and cost/benefit analysis of alternatives. Analyzes user requirements, procedures, and problems to automate processing or to improve existing computer system: Confers with personnel of organizational units involved to analyze current operational procedures, identify problems, and learn specific input and output requirements, such as forms of data input, how data is to be; summarized, and formats for reports. Writes detailed description of user needs, program functions, and steps required to develop or modify computer program. Reviews computer system capabilities, specifications, and scheduling limitations to determine if requested program or program change is possible within existing system.

Site Reliability Engineer will be responsible for ensuring the reliability, availability, performance, and scalability of production systems by applying software engineering practices to infrastructure and operations. Partners with development teams to build resilient, observable, and automated platforms that meet defined service level objectives (SLOs).

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10456060
  • Position Id: TX-529601671
  • Posted 17 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Hybrid in Austin, Texas

Yesterday

Easy Apply

Contract

Depends on Experience

Austin, Texas

Today

Easy Apply

Contract, Third Party

Hybrid in Austin, Texas

Today

Easy Apply

Contract, Third Party

Depends on Experience

Austin, Texas

Today

Easy Apply

Contract, Third Party

$93.76/-

Search all similar jobs