SRE Engineer Observability

Overview

On Site

$70 - $80

Contract - W2

Contract - 6 Month(s)

No Travel Required

Skills

Kubernetes

Grafana

Amazon Web Services

Amazon EC2

Python

Terraform

Job Details

Role: SRE Engineer Observability Location: San Jose, CA Duration: 6+ months (possible extension)
Key Responsibilities:

Design, implement, and maintain end-to-end observability platforms using the Kubernetes + Prometheus Stack (Prometheus, Loki, Grafana, Alert Manager).
Develop and optimize monitoring and alerting solutionsfor large-scale distributed systems on AWS.
Automate observability workflows using Python & Go(e.g., custom exporters, Grafana dashboards).
Integrate with PagerDutyfor incident management and on-call rotations.
Collaborate with DevOps teams to instrument CI/CD pipelinesfor deployment visibility.
Implement Infrastructure-as-Code (IaC)for observability components (Terraform/CloudFormation).
Troubleshoot performance bottlenecks and reliability issues across the stack.

Mandatory Skills:

Experience:5+ years in observability and monitoring systems
Observability Tools: Prometheus, Loki, Grafana, Alert Manager.
Cloud & Kubernetes:AWS (EKS/EC2), Kubernetes monitoring (kube-state-metrics, cAdvisor).
Programming:Proficiency in Python & Go (coding test required).
Incident Management:PagerDuty integration and on-call workflows.
IaC:Terraform or equivalent for provisioning monitoring infrastructure.

Nice-to-Have:

Zoom Developer Platform experience for collaboration tool integrations.
Certifications: AWS Certified DevOps Engineer, Grafana Certified Associate.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share