Apply Now

Data Dog Cloud Engineer Senior

Washington, DC, US • Posted 8 hours ago • Updated 8 hours ago

Contract W2

6 Months

On-site

$60 - $70/hr

Fitment

Dice Job Match Score™

👤 Reviewing your profile...

Job Details

Skills

Data dog
Cloud
AWS
SQL
Mysql

Summary

This role is hands-on: instruments services with distributed tracing, code-level profiling, and custom metrics; builds and tunes Datadog (or comparable) dashboards, alerts, APM, log pipelines, RUM, and synthetic monitors; then uses that telemetry to solve production performance, reliability, and capacity problems. The engineer partners with cloud, platform, and application teams to embed observability into Azure, AWS, and container platforms (OpenShift/Kubernetes), and drives reduction of alert noise, mean time to detect (MTTD), and mean time to resolve (MTTR). This position provides senior technical leadership for APM/distributed tracing strategy, SLO/SLI engineering, and data-driven operational decision-making in a 24x7x365 operating environment.

PRIMARY RESPONSIBILITIES

Observability Platform Engineering

- Engineer and operate the enterprise observability stack (Datadog or comparable), including metrics, logs, traces, APM, RUM, synthetic monitoring, and network performance monitoring.

- Build, tune, and maintain dashboards, monitors, SLOs/SLIs, and alerting policies that produce actionable signal and minimize noise.

- Instrument services, infrastructure, and containerized workloads using agents, OpenTelemetry, and language-specific APM tracers (Java, .NET, Python, Node.js, Go) with consistent span tagging, W3C TraceContext propagation, and unified service tagging across the estate.

- Develop and maintain integrations between observability platforms, ITSM (ServiceNow), CI/CD pipelines, and on-call/paging workflows.

- Define and enforce a unified tagging standard (environment, service, version, team/ownership, data classification, cost center) across metrics, logs, and traces; manage tag cardinality, governance, and custom business tags to keep telemetry queryable, attributable, and cost-controlled.

Cloud and Container Monitoring Engineering

- Design and deliver monitoring coverage for Microsoft Azure and AWS workloads, including PaaS services, serverless, networking, identity, managed databases, and cloud-native data services.

- Engineer managed database observability across AWS RDS/Aurora (MySQL, PostgreSQL, SQL Server, Oracle), Azure SQL/PostgreSQL/MySQL, and NoSQL/cache services (DynamoDB, Cosmos DB, ElastiCache/Redis), including query-level performance analytics, slow-query and execution-plan capture, lock/deadlock/wait analysis, connection pool and session monitoring, replication lag, storage/IOPS saturation, and backup/HA health -- correlating database spans with upstream APM traces.

- Engineer container-platform observability for OpenShift/Kubernetes, covering cluster health, control plane, nodes, pods, namespaces, ingress, service mesh, and workload APM.

- Build standardized, reusable monitoring modules deployable via infrastructure-as-code (Terraform, Bicep, ARM) and CI/CD.

- Support hybrid visibility across on-premises, cloud, and containerized workloads with correlated telemetry.

Performance Engineering and Problem Solving

- Lead data-driven investigation and resolution of complex performance, latency, saturation, and reliability issues across the estate.

- Use APM distributed traces, service/dependency maps, continuous code profiling (CPU, memory, lock contention), database query analytics, exception/error tracking, and RUM-to-backend trace correlation to isolate bottlenecks in applications, platforms, middleware, and downstream dependencies.

- Partner with engineering teams to define and implement remediation, tuning, and architectural improvements based on telemetry evidence.

- Define and implement trace-based SLOs, deployment tracking, and change-correlation workflows so performance regressions are detected and attributed to specific releases, versions, or configuration changes.

- Provide senior technical leadership during major incidents, delivering impact analysis, contributing to root-cause analysis, and owning post-incident observability gaps.

Capacity, Reliability, and Continuous Improvement

- Analyze operational telemetry and trend data to identify capacity risks, recurring constraints, and opportunities for efficiency.

- Build and maintain capacity and performance dashboards and reports that communicate posture, risk, and recommendations to technical and leadership stakeholders.

- Define capacity thresholds, alert baselines, and trigger points for scaling, technology refresh, and resource reallocation.

- Drive continuous improvement of observability coverage, alert quality, runbook linkage, and operational maturity aligned to SEC SLA/KPI expectations.

REQUIRED QUALIFICATIONS

Education: Bachelor's degree in a relevant field (e.g., Information Technology, Computer Science, Engineering).

Experience:

- Minimum 8 years of experience in IT infrastructure or platform engineering roles, including 5+ years focused on observability, performance engineering, or site reliability engineering.

- Demonstrated experience engineering and operating an enterprise observability platform (Datadog strongly preferred; equivalent experience with Dynatrace, New Relic, Splunk Observability, or Grafana/Prometheus stacks considered).

- Proven experience building APM and distributed tracing coverage for production multi-tier applications -- including language-specific tracer deployment, custom instrumentation of business transactions, service/dependency mapping, continuous profiling, and RUM-to-backend trace correlation -- across cloud and containerized workloads.

- Proven experience leading complex production performance and reliability problem-solving from telemetry to remediation.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: RTL61687
Position Id: 9001999
Posted 8 hours ago

Contact the job poster

Syed Adnan Jaffer

Senior Talent Acquisition Specialist @ Unisys

View Profile

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Data Dog Cloud Engineer

Washington, District of Columbia

•

Today

Job Title: Software Specialist V (Data Dog Cloud Engineer Senior) Location: Washington DC Duration: 6 + Months Contract Experience: - Minimum 8 years of experience in IT infrastructure or platform engineering roles, including 5+ years focused on observability, performance engineering, or site reliability engineering. - Demonstrated experience engineering and operating an enterprise observability platform (Datadog strongly preferred; equivalent experience with Dynatrace, New Relic, Splunk Obser

Easy Apply

Contract

$70 - $80

Senior Data Dog Cloud Engineer (Observability)

Hybrid in Washington, District of Columbia

•

Today

Senior Data Dog Cloud Engineer (Observability)Work location:Hybrid- 1* weekin Washington, D.C. 20002 Type:Contract-to-hire Clearance:Must be able to obtain/maintainPublic Trust Compensation:$63/HR What youll do (day-to-day)Youll be the go-to senior engineer for building and improving an enterprise observability programusing Datadog (or a comparable platform)to help teams detect issues faster, reduce alert noise, and improve reliability in a 24x7 environment. Key responsibilities include: Build

Easy Apply

Contract

$61.87

Observability Engineer

Hybrid in McLean, Virginia

•

Today

Job Number: R0239522 Observability Engineer The Opportunity : So met hing breaks at 2 AM. Today, a human gets paged. Tomorrow, an AI agent detects the anomaly, correlates the root cause, triggers the remediation, and closes the ticket, all before the first cup of coffee. You are the engineer who builds that tomorrow. We are seeking a senior Observability Engineer with expertise in both AI technologies and enterprise performance monitoring. This role combines hands-on engineering with AIOps im

Full-time

USD 86,800.00 - 198,000.00 per year

Observability Engineer

Hybrid in McLean, Virginia

•

Today

Job Number: R0237472 Observability Engineer The Opportunity : So met hing breaks at 2 AM. Today, a human gets paged. Tomorrow, an AI agent detects the anomaly, correlates the root cause, triggers the remediation, and closes the ticket, all before the first cup of coffee. You are the engineer who builds that tomorrow. We are seeking a senior Observability Engineer with expertise in both AI technologies and enterprise performance monitoring. This role combines hands-on engineering with AIOps im

Full-time

USD 86,800.00 - 198,000.00 per year

Search all similar jobs

More jobs at Unisys in Washington, DC