Overview
Remote
Accepts corp to corp applications
Contract - W2
Contract - Independent
Contract - 12
Skills
AWS
Azure
CI/CD
DataDog
Job Details
Datadog Architect
Remote role.
Length: 1 year.
Job Description:
Architecture & Design
- Design end-to-end observability architecture using Datadog across cloud Azure, containers, Kubernetes, and on-prem workloads.
- Define monitoring standards, SLIs/SLOs, dashboards, alerting strategy, and tagging governance.
- Design and Architect end to end solution to integrate Mainframe platforms
- Architect log ingestion pipelines, retention policies, and cost-optimized indexing strategies.
- Build scalable APM instrumentation patterns for microservices, serverless, and distributed environments.
Implementation & Optimization
- Deploy Datadog agents, integrations, and custom checks across large-scale infrastructure.
- Configure APM, RUM, Logs, SIEM, Synthetics, Network Performance Monitoring, and CI/CD Observability.
- Work closely with DevOps, SRE, Cloud, and Application teams to instrument services and ensure visibility.
- Analyze and optimize Datadog costs: usage, retention settings, indexing, and billing insights.
Governance & Best Practices
- Establish organization-wide tagging standards, dashboards, alerting guardrails, and onboarding processes.
- Create reusable templates, Terraform modules, and automation scripts for Datadog deployment.
- Ensure compliance with security and observability best practices.
- Mentor teams on Datadog usage, training engineers on dashboards, logs, traces, and alerts.
Troubleshooting & Insights
- Lead RCA investigations using Datadog metrics, traces, logs, and correlated events.
- Collaborate with engineering teams to improve system reliability, resilience, and performance.
- Identify gaps in observability and propose improvements across the stack.
Required Skills & Experience
- 12 years in Observability, Monitoring, SRE, DevOps, or Cloud Engineering.
- 6+ years of hands-on experience with Datadog.
- Strong understanding of distributed systems, microservices, and cloud-native architectures.
- Expertise with Kubernetes, Docker, AWS/Azure/Google Cloud Platform cloud services.
- Experience with Infrastructure as Code (Terraform preferred).
- Strong knowledge of APM, Metrics, Logs, RUM, Synthetics, and Security Monitoring.
- Deep experience with Datadog dashboards, alerting, monitors, service maps, event correlation, and notebooks.
- Proficiency with Python, Bash, or similar scripting languages.
- Strong analytical, communication, and problem-solving skills.
Preferred Qualifications
- Datadog Certifications (Datadog Fundamentals, APM, Log Management, or Observability).
- Experience with Retail for observability tools.
- CI/CD observability experience (GitHub Actions, Jenkins, GitLab CI, etc).
- Background in Performance Engineering, Reliability Engineering, or Platform Engineering.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.