We are growing our enterprise observability practice and are seeking an experienced Datadog Engineer to help lead and scale our monitoring, analytics, and reliability capabilities.
What You'll Do
- Design, architect, and scale enterprise observability solutions using Datadog across applications, infrastructure, cloud, and security platforms
- Architect dashboards, monitors, and alerting frameworks aligned to business, operational, and reliability requirements
- Define and implement best practices for metrics, logs, traces, and anomaly detection
- Lead deployment and configuration of Datadog agents, APIs, integrations, and automation across complex, multi-cloud environments
- Integrate Datadog with CI/CD pipelines, logging platforms, and collaboration tools (e.g., GitLab, ServiceNow, Jira, Slack)
- Identify observability gaps and drive improvements in signal quality, reliability, and incident response
- Optimize Datadog usage and licensing costs while maintaining strong coverage and actionable insights
- Partner closely with DevOps, SRE, Cloud, Application, and Security teams to embed observability into daily operations
- Produce clear documentation and contribute to knowledge sharing across teams
- Recent enterprise-level ownership is required
- Hands-on experience designing and owning Datadog observability solutions (not execution-only roles)
- Proven experience as a technical decision-maker in a modern software development environment.
Strong Experience With:
- Datadog Monitoring, APM, Distributed Tracing, and alerting
- Cloud platforms and DevOps / SRE practices
- CI/CD integrations and automation
- Scripting and configuration skills (Python, Bash, PowerShell, YAML, etc.)
- Strong communication skills with the ability to collaborate across engineering, product, and business stakeholders
- Demonstrated ability to articulate measurable impact and outcomes (e.g., MTTR reduction, reliability improvements, cost optimization)