SRE with Observability Engineer

Remote • Posted 4 hours ago • Updated 4 hours ago
Contract W2
12 Months
No Travel Required
Remote
$65 - $68/hr
Fitment

Dice Job Match Score™

🧠 Analyzing your skills...

Job Details

Skills

  • Agile
  • Dashboard
  • Continuous Integration
  • Log Analysis
  • Microsoft Azure
  • SLA
  • Incident Management
  • ITIL
  • Collaboration
  • Cloud Computing
  • Python
  • Shell
  • Microservices
  • DevOps
  • Dynatrace
  • Grafana
  • Kubernetes

Summary

Job Title: SRE with Observability Engineer
Location: Remote
Experience: 12+ Years
 
 

Job Summary

We are seeking a highly skilled Site Reliability Engineer (SRE) with strong expertise in Observability, Monitoring, and Production Support. The ideal candidate should have hands-on experience in monitoring enterprise applications and infrastructure, improving system reliability, troubleshooting production issues, and implementing observability solutions across cloud and on-prem environments.

Required Skills

  • Strong experience as an SRE/Observability Engineer

  • Hands-on experience with observability and monitoring tools such as Dynatrace, Splunk, Grafana, Prometheus, or AppDynamics

  • Experience in application monitoring, log analysis, and alerting

  • Knowledge of Linux/Unix administration

  • Experience with cloud platforms such as Amazon Web Services, Microsoft Azure, or Google Cloud

  • Understanding of SLI, SLO, SLA, and reliability engineering concepts

  • Experience with CI/CD pipelines and DevOps tools

  • Strong troubleshooting and incident management skills

  • Scripting knowledge in Python, Shell, or PowerShell

  • Experience with Kubernetes, Docker, and automation tools is preferred

Roles & Responsibilities

  • Design and implement observability and monitoring solutions for applications and infrastructure

  • Monitor system health, application performance, and reliability metrics

  • Configure dashboards, alerts, logs, and tracing solutions

  • Investigate production incidents and perform root cause analysis (RCA)

  • Collaborate with DevOps, Infrastructure, and Development teams to improve platform reliability

  • Automate operational tasks and monitoring processes

  • Maintain system uptime, availability, and performance standards

  • Support on-call rotations and critical incident management

  • Build and maintain observability best practices across environments

Preferred Qualifications

  • Experience in cloud-native and microservices environments

  • Knowledge of OpenTelemetry and distributed tracing

  • Certification in cloud or observability platforms is a plus

  • Familiarity with ITIL processes and Agile methodology

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 91008924
  • Position Id: 8969822
  • Posted 4 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote

Today

Easy Apply

Contract, Third Party

Depends on Experience

Remote

8d ago

Easy Apply

Contract

70 - 80

Remote

5d ago

Easy Apply

Contract

Depends on Experience

Remote

Today

Easy Apply

Contract

70 - 80

Search all similar jobs