Site Reliability Engineer - W2 Role

Palo Alto, CA, US • Posted 1 day ago • Updated 2 minutes ago
Contract W2
On-site
Fitment

Dice Job Match Score™

📋 Comparing job requirements...

Job Details

Skills

  • AWS
  • Python
  • Terraform
  • Ansible
  • Grafana
  • Prometheus
  • Site Reliability
  • Linux / Unix

Summary

Role: Site Reliability Engineer (SRE)
Location: Palo Alto, CA (Onsite from Day 1)
Job Type: Contract (W2)
Skill Matrix:
Name
Required
Programming
Yes
SRE
Yes
Grafana
Yes
Prometheus
Yes
AWS
Yes
Cloud Infrastructure
Yes
Linux
Yes
UNIX
Yes
Top skills required for this role:
Programming: Proficiency in languages like Python, Java, or Go.
System Administration: Strong understanding of Linux/Unix systems.
Cloud Infrastructure: Experience with AWS
Infrastructure as Code (IaC): Knowledge of tools like Terraform or Ansible.
Monitoring Tools: Proficiency with tools such as Prometheus, Grafana, or Datadog
Job Description/ Responsibilities:
Automation and Tooling: SREs write code to automate operational tasks, such as provisioning, configuration changes, and system updates to reduce manual work and human error.
System Monitoring and Alerting: Developing and maintaining observability stacks (logs, metrics, tracing) to proactively detect issues before they impact users.
Incident Response and On-Call: Managing 24/7 on-call rotation to respond to, troubleshoot, and resolve production incidents.
Post-Incident Reviews (Postmortems): Conducting blameless, in-depth reviews of incidents to identify root causes and implement preventive measures.
Capacity Planning: Analyzing system resource utilization to ensure infrastructure can scale to handle future load requirements.
Performance Optimization: Identifying and fixing bottlenecks in software and infrastructure to improve system efficiency and responsiveness.
Error Budget Management: Setting and managing Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to determine if a service is reliable enough to allow new feature deployments.
Chaos Engineering: Testing system resilience by intentionally introducing failures to ensure systems are fault-tolerant
Years of Experience: 8+ Years of Experience
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 91091604
  • Position Id: 2026-3893
  • Posted 1 day ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Sunnyvale, California

Today

Easy Apply

Third Party, Contract

$55

San Jose, California

Today

Easy Apply

Contract

40 - 45

Mountain View, California

Today

Easy Apply

Third Party, Contract

50

Hybrid in Sunnyvale, California

3d ago

Easy Apply

Contract

Depends on Experience

Search all similar jobs