Job Title: Senior Observability Engineer (SRE)
Location: Austin, TX (Hybrid 3 days onsite)
Must-Have Skills:
- Open Telemetry (OTel)
- Datadog APM, metrics & tracing
- Observability for distributed systems
- SLO / SLI based alerting
- Dashboards & Golden Signals
Role Overview:
Our Ecommerce SRE team is looking for a specialized Observability Engineer to strengthen the visibility and reliability of our global platforms. In this role, you will be responsible for building the foundation of how we monitor, trace, and alert across both modern cloud offerings and legacy systems. You will play a critical part in standardizing our telemetry stack and ensuring that "observability-as-code" is a reality for our engineering teams.
Open Telemetry Engineering: Configure and implement Otel processors, filters, and metadata enrichment. Standardize and deploy approved exporters across the pipeline.
Library Development: Build and maintain Otel-based tracing wrappers, metrics helpers, and context propagation utilities.
Metadata Implementation: Execute metadata enrichment standards across the SDK, Collector, and Gateway levels.
Platform Integration: Engineer the integration of Open Telemetry libraries with both Grafana and Datadog, ensuring zero telemetry loss during data transfer.
Alerts Re-instrumentation: Refactor and migrate legacy Java-based instrumentation into standard OpenTelemetry formats to ensure accurate alerting.
Documentation & Support: Create technical runbooks and document implementation details to facilitate knowledge sharing and team adoption.
Qualifications:
- Java Proficiency: Strong hands-on experience with Java, specifically in refactoring legacy codebases.
- Open Telemetry Expertise: Deep technical understanding of the Otel ecosystem, including configuring Collectors, SDKs, and Exporters.
- Observability Platforms: Proven experience integrating telemetry data with Grafana and Datadog.