SRE Engineer Observability

Overview

On Site
$70 - $80
Contract - W2
Contract - 6 Month(s)
No Travel Required

Skills

Kubernetes
Grafana
Amazon Web Services
Amazon EC2
Python
Terraform

Job Details

Role: SRE Engineer Observability Location: San Jose, CA Duration: 6+ months (possible extension)
Key Responsibilities:

  • Design, implement, and maintain end-to-end observability platforms using the Kubernetes + Prometheus Stack (Prometheus, Loki, Grafana, Alert Manager).
  • Develop and optimize monitoring and alerting solutionsfor large-scale distributed systems on AWS.
  • Automate observability workflows using Python & Go(e.g., custom exporters, Grafana dashboards).
  • Integrate with PagerDutyfor incident management and on-call rotations.
  • Collaborate with DevOps teams to instrument CI/CD pipelinesfor deployment visibility.
  • Implement Infrastructure-as-Code (IaC)for observability components (Terraform/CloudFormation).
  • Troubleshoot performance bottlenecks and reliability issues across the stack.

Mandatory Skills:

  • Experience:5+ years in observability and monitoring systems
    Observability Tools: Prometheus, Loki, Grafana, Alert Manager.
  • Cloud & Kubernetes:AWS (EKS/EC2), Kubernetes monitoring (kube-state-metrics, cAdvisor).
  • Programming:Proficiency in Python & Go (coding test required).
  • Incident Management:PagerDuty integration and on-call workflows.
  • IaC:Terraform or equivalent for provisioning monitoring infrastructure.

Nice-to-Have:

  • Zoom Developer Platform experience for collaboration tool integrations.
  • Certifications: AWS Certified DevOps Engineer, Grafana Certified Associate.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.