Site Reliability Engineers (W2 only) | 4-8 yrs of experience

  • Berkeley Heights, NJ
  • Posted 1 day ago | Updated 1 day ago

Overview

Hybrid
$100,000 - $140,000
Contract - W2
Contract - 2 Year(s)
No Travel Required

Skills

SRE
Telemetry
Observability
Datadog
Splunk
Prometheus
Grafana
ELK
Python
Shell

Job Details

  • Location: Alpharetta, GA or Berkeley Heights, NJ
  • Work Mode: Hybrid (Onsite 2 3 days per week)
  • Employment Type: Contract (W2 only No C2C)
  • Duration: Multi-year engagement, extended annually
  • H1-B Transfer: Available for the right candidate

About the Role

We are seeking 4 Site Reliability Engineers (SRE) (2 Seniors with 8 years of experience and 2 juniors with at least 4 years of experience) with strong expertise in observability, telemetry, and monitoring platforms to join our team in Alpharetta, GA or Berkeley Heights, NJ. This is a senior-level contract role in a hybrid onsite model. Candidates must demonstrate hands-on experience in enterprise-scale systems reliability, incident response, and monitoring automation.

 

Key Responsibilities

  • Design, implement, and maintain telemetry and observability solutions across enterprise systems.
  • Build and scale real-time monitoring dashboards and alerts using Datadog, Splunk, Prometheus, Grafana, and similar tools.
  • Collaborate with engineering teams to ensure systems are resilient, reliable, and performant.
  • Automate monitoring and reliability tasks using Python, Shell, or Go scripting.
  • Manage incident response: identify, troubleshoot, and resolve production issues quickly.
  • Implement SLOs/SLIs, error budgets, and reliability KPIs for mission-critical services.
  • Develop self-healing and auto-remediation capabilities for production environments.
  • Partner with DevOps, Cloud, and Security teams to optimize CI/CD pipelines and infrastructure reliability.
  • Contribute to post-incident reviews and drive improvements for system reliability.

 

Required Qualifications

  • 4-8 years of professional experience in Site Reliability Engineering or DevOps.
  • Proven expertise in telemetry, observability, and monitoring (Datadog, Splunk, Prometheus, Grafana, or ELK).
  • Strong experience with incident management and on-call support in enterprise environments.
  • Proficiency in Linux system administration, networking, and performance tuning.
  • Hands-on experience with cloud platforms (AWS, Azure, or Google Cloud Platform).
  • Solid programming/scripting skills in Python, Bash, Go, or equivalent.
  • Familiarity with container orchestration (Kubernetes, Docker).
  • Experience designing and maintaining CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI, or similar).
  • Strong analytical skills to improve system performance, uptime, and scalability.
  •  

Nice-to-Have Skills

  • Knowledge of AIOps, anomaly detection, and predictive monitoring.
  • Experience with infrastructure-as-code (Terraform, Ansible, Pulumi).
  • Exposure to security monitoring and compliance integration with observability stacks.

Engagement Rules

  • Contract Position (W2 only) No C2C, No Agencies.
  • Number of Positions 4 (2 Seniors with 8 years of experience and 2 juniors with at least 4 years of experience)
  • Senior-level role (6+ years experience) entry-level applicants will not be considered.
  • Multi-year contract with annual extensions.
  • H1-B transfer available for the right candidate.
  • Hybrid onsite role (2 3 days per week, Alpharetta, GA or Berkeley Heights, NJ).
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.