Site Reliability Engineer

Remote • Posted 14 hours ago • Updated 14 hours ago
Contract W2
12 Months
No Travel Required
Remote
Depends on Experience
Fitment

Dice Job Match Score™

🎯 Assessing qualifications...

Job Details

Skills

  • Site Reliability Engineer
  • SRE
  • Golang
  • Go
  • Kubernetes
  • Google Cloud
  • DevOps
  • Software Engineering
  • Google Cloud Platform
  • Incident Management
  • Operational Efficiency

Summary

We are seeking a highly skilled Site Reliability Engineer (SRE) with strong Golang development experience to improve the reliability, scalability, and performance of our production systems. This role combines software engineering, incident response, observability, and data analysis to build resilient platforms and automate operational excellence. You will develop tools and services that transform production incident data into actionable insights while driving reliability initiatives across cloud-native environments.

 

Key Responsibilities:

Develop and maintain reliability tooling and automation using Golang.

Participate in production incident response, troubleshooting, root cause analysis (RCA), and postmortem reviews.

Analyze incident and system performance data to identify trends and recommend reliability improvements.

Design and enhance observability solutions, including metrics, structured logging, distributed tracing, and alerting.

Build scalable automation to improve operational efficiency and reduce manual intervention.

Collaborate with software engineering teams to improve application reliability, performance, and resilience.

Manage and optimize Kubernetes-based production environments running on Google Cloud Platform (Google Cloud Platform).

Apply statistical techniques such as anomaly detection, regression analysis, and trend analysis to improve system health.

Communicate technical findings and business impact clearly to engineering and non-technical stakeholders.

Required Qualifications:

4+ years of experience in Site Reliability Engineering (SRE), DevOps, Platform Engineering, or Systems Engineering supporting large-scale production environments.

Strong hands-on experience with Golang (Go) is mandatory.

Strong SQL skills with experience performing data analysis on production and operational datasets.

Hands-on experience with Kubernetes and Google Cloud Platform (Google Cloud Platform).

Deep understanding of observability technologies, including monitoring, alerting, distributed tracing, structured logging, and metrics collection.

Experience with incident response, production support, root cause analysis, and reliability engineering best practices.

Knowledge of distributed systems, cloud-native architectures, and production operations.

Strong scripting and automation skills.

Excellent communication and documentation skills.


📩 Please share your updated resume at

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10110984
  • Position Id: 9013413
  • Posted 14 hours ago
Contact the job poster
Bob McLauchlan

Bob McLauchlan

Recruiter @ DPP Tech, Inc.
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote or California

Today

Full-time

USD 140,000.00 - 180,000.00 per year

Remote

Today

Full-time

USD 150,000.00 - 225,000.00 per year

Remote

Today

Full-time

USD 114,000.00 - 148,000.00 per year

Remote or Menomonee Falls, Wisconsin

Today

Full-time

Search all similar jobs