Senior Data Dog Cloud Engineer (Observability)
Work location: Hybrid- 1* week in Washington, D.C. 20002
Type: Contract-to-hire
Clearance: Must be able to obtain/maintain Public Trust
Compensation: $63/HR
What you’ll do (day-to-day)
You’ll be the go-to senior engineer for building and improving an enterprise observability program—using Datadog (or a comparable platform)—to help teams detect issues faster, reduce alert noise, and improve reliability in a 24x7 environment.
Key responsibilities include:
- Build and run observability tooling across metrics, logs, traces/APM, RUM, synthetics, and network monitoring
- Create and maintain dashboards, monitors, alerts, SLOs/SLIs that teams actually use (high signal, low noise)
- Instrument applications and services using agents, OpenTelemetry, and language-specific APM tooling (Java, .NET, Python, Node.js, Go)
- Improve production performance and reliability using telemetry to troubleshoot:
- latency, saturation, capacity, errors, and dependency issues
- Partner with cloud/platform/app teams to embed observability into:
- AWS + Azure
- Kubernetes/OpenShift
- Integrate monitoring workflows with ServiceNow, CI/CD pipelines, and on-call/paging processes
- Establish and enforce telemetry standards (tagging strategy, governance, cost control)
What we’re looking for (must-haves)
- 8+ years in infrastructure/platform engineering (or similar), including 5+ years focused on observability/performance/SRE-type work
- Hands-on experience operating an observability platform:
- Datadog preferred, but Dynatrace / New Relic / Splunk Observability / Grafana+Prometheus are also relevant
- Strong experience with APM + distributed tracing in production (including instrumentation and service mapping)
- Production monitoring experience with Kubernetes/OpenShift
- Cloud experience supporting monitoring/telemetry in AWS and/or Azure
- Bachelor’s degree (or equivalent experience)
- Ability to support an on-call rotation in a 24x7 environment
Nice-to-haves (stand out if you have them)
- Experience in federal/government environments (FISMA, FedRAMP, NIST-aligned)
- Datadog certifications (or comparable)
- eBPF observability, service mesh telemetry (Istio/Linkerd)
- Terraform/Bicep/ARM for deploying monitoring as code
- AWS/Azure/OpenShift/Terraform certifications
System One, and its subsidiaries including Joulé and Mountain Ltd., are leaders in delivering outsourced services and workforce solutions across North America. We help clients get work done more efficiently and economically, without compromising quality. System One not only serves as a valued partner for our clients, but we offer eligible employees health and welfare benefits coverage options including medical, dental, vision, spending accounts, life insurance, voluntary plans, as well as participation in a 401(k) plan.
System One is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, age, national origin, disability, family care or medical leave status, genetic information, veteran status, marital status, or any other characteristic protected by applicable federal, state, or local law.
#M-1
#LI-AJ1
Ref: #851-Rockville-S1