SRE ( Site Reliability Engineer )

Hybrid in Alpharetta, GA, US • Posted 5 hours ago • Updated 5 hours ago

Contract W2

50% Travel Required

On-site

$60+

Clarkstech

Fitment

Dice Job Match Score™

🤯 Applying directly to the forehead...

Job Details

Skills

Telemetry

Summary

We are seeking Site Reliability Engineers (SRE) mandatory, hands-on expertise in telemetry, observability, and site monitoring platforms. This is a hybrid contract role based in Alpharetta, GA or Berkeley Heights, NJ.

This role requires proven, production-level experience with enterprise observability stacks.

Key Responsibilities

Design, implement, and maintain comprehensive telemetry and observability solutions across distributed enterprise systems with complex architectures.
Build, optimize, and scale real-time monitoring dashboards, metrics pipelines, and intelligent alerting systems using industry-standard tools including Datadog, Splunk, Prometheus, Grafana, ELK Stack, and similar platforms.
Implement end-to-end observability strategies encompassing metrics, logs, traces, and events to ensure complete system visibility.
Develop and maintain custom instrumentation for applications and infrastructure to capture critical telemetry data.
Collaborate with engineering teams to embed reliability practices and ensure systems are resilient, observable, and performant.
Automate monitoring workflows, alert management, and reliability tasks using Python, Shell, or Go scripting.
Lead incident response efforts: rapidly identify, troubleshoot, and resolve production issues using observability data and telemetry analysis.
Design and implement SLOs/SLIs, error budgets, and reliability KPIs with corresponding monitoring and alerting for mission-critical services.
Develop self-healing and auto-remediation capabilities leveraging observability insights.
Partner with DevOps, Cloud, and Security teams to integrate observability into CI/CD pipelines and optimize infrastructure reliability.
Conduct post-incident reviews with detailed telemetry analysis and drive systemic improvements.

Mandatory Skills & Qualifications

Telemetry & Observability (MANDATORY)

Candidates MUST demonstrate hands-on, production experience with the following:

Observability Platforms (REQUIRED): Deep expertise in at least TWO of the following:
- Datadog (metrics, APM, logs, traces)
- Splunk (log aggregation, search, alerting, dashboards)
- Prometheus (time-series metrics, PromQL, alerting rules)
- Grafana (visualization, dashboard creation, data source integration)
- ELK Stack (Elasticsearch, Logstash, Kibana)
Telemetry & Monitoring Fundamentals (REQUIRED):
- Building and maintaining metrics collection pipelines
- Log aggregation, parsing, and analysis at scale
- Distributed tracing and application performance monitoring (APM)
- Creating actionable alerts with proper signal-to-noise ratios
- Dashboard design for real-time system health visualization
- Metrics instrumentation and custom telemetry implementation
Observability Best Practices (REQUIRED):
- Implementing the three pillars of observability: metrics, logs, and traces
- Correlation of telemetry data across multiple sources
- Establishing observability for microservices and distributed systems
- Capacity planning using historical telemetry data
- Performance baselining and anomaly detection

Core SRE Requirements (MANDATORY)

4-8 years of professional experience in Site Reliability Engineering or DevOps roles with significant focus on observability
Proven track record in incident management and on-call support in enterprise production environments, using observability tools for rapid diagnosis
Proficiency in Linux system administration, networking, and performance tuning
Hands-on experience with cloud platforms (AWS, Azure, or Google Cloud Platform) including cloud-native monitoring solutions (CloudWatch, Azure Monitor, Google Cloud Platform Operations)
Solid programming/scripting skills in Python, Bash, Go, or equivalent for automation and tooling
Familiarity with container orchestration (Kubernetes, Docker) and monitoring containerized environments
Experience designing and maintaining CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI) with integrated monitoring and observability

Nice-to-Have Skills

AIOps and intelligent monitoring: Experience with ML-based anomaly detection, predictive monitoring, and automated incident correlation
OpenTelemetry: Implementation experience with OpenTelemetry for standardized observability instrumentation
Infrastructure-as-code: Terraform, Ansible, Pulumi with monitoring-as-code practices
Security observability: Integration of security monitoring, SIEM tools, and compliance frameworks with observability stacks
Advanced telemetry tools: Experience with Jaeger, Zipkin, New Relic, AppDynamics, Dynatrace, or other specialized APM/observability platforms
Custom metrics exporters: Development of Prometheus exporters or custom telemetry agents
Cost optimization: Experience optimizing telemetry data retention and observability platform costs

Engagement Rules

Contract Position (W2 only) No C2C, No Agencies
Number of Positions 4 (2 Seniors with 8 years of experience and 2 juniors with at least 4 years of experience)
Experience requirement: 4-8 years with mandatory telemetry/observability expertise
Multi-year contract with annual extensions

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 91165214
Position Id: 8859859
Posted 5 hours ago

Company Info

About Clarkstech

At ClarksTech, we are a renowned global IT consulting firm committed to collaborating with business and societal leaders in overcoming their most critical challenges and seizing their greatest opportunities. Our achievements are rooted in fostering deep collaboration and cultivating a global community of diverse individuals who are dedicated.

We have highly skilled engineers with excellent technical knowledge and experience in using the latest software standards. We have built a large pool of knowledge that we apply to deliver solutions that meet client’s needs, expectations and budget.

Go to company profile

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

It looks like there aren't any Similar Jobs for this job yet.

Search all similar jobs