Overview
Skills
Job Details
Job Title: Observability Lead / Architect
Location: Dallas, TX 2-3 days per week at office
Observability Architect New Relic | Splunk | CloudWatch | Kibana | APM | Monitoring Solutions
Experience Required: 8+ Years
Key Responsibilities:
Design and implement end-to-end observability strategies covering metrics, logs, traces, and user experience monitoring
Architect custom monitoring frameworks tailored to specific business applications and infrastructure landscapes
Implement and manage observability platforms including New Relic, Splunk, AWS CloudWatch, and Kibana
Develop and maintain APM scripts, synthetic monitors, custom dashboards, and alerting mechanisms
Integrate observability tools with CI/CD pipelines for proactive issue detection and faster MTTR
Collaborate with application, infrastructure, DevOps, and security teams to ensure observability coverage across systems
Conduct root cause analysis using correlation across metrics, logs, and traces
Provide technical leadership in observability best practices, architecture reviews, and roadmap planning
Define and enforce standards for SLAs, SLOs, and SLIs across environments
Mentor and guide engineering teams in the effective use of observability tools
Key Skills & Technologies:
Monitoring & APM Tools:
Deep experience with New Relic (including APM, infrastructure, synthetics, custom instrumentation)
Strong proficiency in Splunk (querying, dashboards, alerts, ingestion pipeline design)
Hands-on with AWS CloudWatch (metrics, logs, alarms, insights)
Working knowledge of Kibana and Elastic Stack (ELK)
Scripting & Customization:
Experience in APM scripting, custom instrumentation (using Java, Python, or Node.js agents)
Ability to create synthetic monitors, custom event generators, and automated dashboards
Familiarity with Terraform, CloudFormation, or scripting languages (Shell, Python) for observability automation
Architecture & Integration:
Expertise in designing observability frameworks for cloud-native (AWS/Google Cloud Platform/Azure) and hybrid environments
Understanding of distributed systems, microservices, and event-driven architectures
Ability to integrate observability platforms with DevOps pipelines, incident response, and ITSM tools