Apply Now

Site Reliability Engineer

Remote • Posted 14 hours ago • Updated 14 hours ago

Contract W2

12 Months

No Travel Required

Remote

Depends on Experience

Fitment

Dice Job Match Score™

🎯 Assessing qualifications...

Job Details

Skills

Site Reliability Engineer
SRE
Golang
Go
Kubernetes
Google Cloud
DevOps
Software Engineering
Google Cloud Platform
Incident Management
Operational Efficiency

Summary

We are seeking a highly skilled Site Reliability Engineer (SRE) with strong Golang development experience to improve the reliability, scalability, and performance of our production systems. This role combines software engineering, incident response, observability, and data analysis to build resilient platforms and automate operational excellence. You will develop tools and services that transform production incident data into actionable insights while driving reliability initiatives across cloud-native environments.

Key Responsibilities:

Develop and maintain reliability tooling and automation using Golang.

Participate in production incident response, troubleshooting, root cause analysis (RCA), and postmortem reviews.

Analyze incident and system performance data to identify trends and recommend reliability improvements.

Design and enhance observability solutions, including metrics, structured logging, distributed tracing, and alerting.

Build scalable automation to improve operational efficiency and reduce manual intervention.

Collaborate with software engineering teams to improve application reliability, performance, and resilience.

Manage and optimize Kubernetes-based production environments running on Google Cloud Platform (Google Cloud Platform).

Apply statistical techniques such as anomaly detection, regression analysis, and trend analysis to improve system health.

Communicate technical findings and business impact clearly to engineering and non-technical stakeholders.

Required Qualifications:

4+ years of experience in Site Reliability Engineering (SRE), DevOps, Platform Engineering, or Systems Engineering supporting large-scale production environments.

Strong hands-on experience with Golang (Go) is mandatory.

Strong SQL skills with experience performing data analysis on production and operational datasets.

Hands-on experience with Kubernetes and Google Cloud Platform (Google Cloud Platform).

Deep understanding of observability technologies, including monitoring, alerting, distributed tracing, structured logging, and metrics collection.

Experience with incident response, production support, root cause analysis, and reliability engineering best practices.

Knowledge of distributed systems, cloud-native architectures, and production operations.

Strong scripting and automation skills.

Excellent communication and documentation skills.

📩 Please share your updated resume at

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10110984
Position Id: 9013413
Posted 14 hours ago

Contact the job poster

Bob McLauchlan

Recruiter @ DPP Tech, Inc.

View Profile

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Staff Site Reliability Engineer

Remote or California

•

Today

Description POSITION DESCRIPTION: We are looking for a Staff Site Reliability Engineer (SRE) to join our Grid & Energy Services team to ensure our systems are highly available, scalable, and reliable in production environments. This role will focus on system performance, observability, and operational excellence across cloud-based and distributed systems supporting energy infrastructure. LOCATION & WORK ARRANGEMENT This role is remote, however, the candidate(s) will be expected to relocate at t

Full-time

USD 140,000.00 - 180,000.00 per year

Staff Site Reliability Engineer

Remote

•

Today

About AlphaSense: The world's most sophisticated companies rely on AlphaSense to remove uncertainty from decision-making. With market intelligence and search built on proven AI, AlphaSense delivers insights that matter from content you can trust. Our universe of public and private content includes equity research, company filings, event transcripts, expert calls, news, trade journals, and clients' own research content. The acquisition of Tegus by AlphaSense in 2024 advances our shared mission to

Full-time

USD 150,000.00 - 225,000.00 per year

Site Reliability Engineer

Remote

•

Today

Site Reliability Engineer Location: Remote, United States Employment Type: Full-Time Benefits Offered: Vision, Medical, Life, Dental, 401K Gross Annual Base Salary: USD 114,000-148,000 Additional variable compensation and benefits may apply. Total compensation is based on experience, skills, and location using objective, job-related criteria. Summary As a Site Reliability Engineer, you will focus on ensuring the platform and services customers rely on are reliable, performant, and highly availa

Full-time

USD 114,000.00 - 148,000.00 per year

Reliability Engineer (Remote)

Remote or Menomonee Falls, Wisconsin

•

Today

Role Specific Information Job Description About the Role As Reliability Engineer, you will ensure the resilience and availability of Kohl's systems and applications and collaborate closely with development teams to review designs, conduct risk assessments and implement robust monitoring and failover mechanisms. What You'll Do Drive incident response efforts, perform root cause analysis and implement preventative measures to enhance system reliabilityEstablish consistent practices that elevate

Full-time

Search all similar jobs

Remote jobs at DPP Tech, Inc.