![]()
Senior AI/ML Observability Engineer
Location: Dallas, TX or Tampa, FL
Engagement Type: Contract to Hire
Interview Process: Two rounds
Role Overview
We are seeking a Senior AI/ML Observability Engineer to join a strategic observability team focused on building reusable, enterprise wide anomaly detection solutions. This role blends hands on AI/ML engineering, observability expertise, and automation to proactively detect system issues and improve production reliability.
The ideal candidate has strong Python-based ML experience, a solid grasp of observability principles (logs, metrics, traces), and has worked closely with Infrastructure, SRE, and Engineering teams to implement scalable observability solutions across complex systems.
This is a senior individual contributor role requiring independence, initiative, and subject matter expertise.
Key Responsibilities
AI/ML & Observability Engineering
- Design, build, and deploy AI/ML models for anomaly detection across telemetry data (logs, metrics, traces, KPIs)
- Translate early stage use cases into generalized, reusable observability solutions
- Modify and extend models to support multiple applications and teams
- Apply ML techniques to predict system anomalies before production impact
Telemetry & System Monitoring
- Analyze and correlate logs, metrics, traces, and system KPIs
- Identify early warning signals of instability or degradation
- Build dashboards and alerts using observability platforms
Collaboration & Strategy
- Work closely with Infrastructure, SRE, Developers, and Architects
- Contribute to enterprise observability strategy
- Act as a subject matter expert for AI driven observability
- Operate independently within a small, high impact team
Automation & Cloud
- Develop automation to support end to end observability workflows
- Deploy solutions in cloud environments
- Leverage OpenTelemetry standards for instrumentation and data collection
Required Qualifications
- 6+ years of experience in AI/ML engineering, SRE, or observability focused roles
- Strong expertise in Python for data processing and ML development
- Hands on experience building ML models for anomaly detection
- Solid understanding of observability principles (logs, metrics, traces)
- Experience withobservability tools such as:
- Grafana (preferred)
- Splunk
- Dynatrace
- Familiarity with OpenTelemetry
- Strong automation skills (pipelines, workflows, reusable components)
- Experience working in cloud environments
- Excellent problem solving and communication skills
Preferred Qualifications
- Experience designing predictive models for system reliability
- Background supporting production systems in large scale environments
- Experience building reusable ML platforms or shared services
- Exposure to enterprise wide monitoring or observability programs
Ideal Candidate Profile
- Senior level, hands on engineer
- Strong ownership mindset; able to drive work end to end
- Comfortable operating with limited supervision
- Strategic thinker with pragmatic execution skills
- Passionate about reliability, automation, and proactive problem detection
Dexian stands at the forefront of Talent + Technology solutions with a presence spanning more than 70 locations worldwide and a team exceeding 10,000 professionals. As one of the largest technology and professional staffing companies and one of the largest minority-owned staffing companies in the United States, Dexian combines over 30 years of industry expertise with cutting-edge technologies to deliver comprehensive global services and support.
Dexian connects the right talent and the right technology with the right organizations to deliver trajectory-changing results that help everyone achieve their ambitions and goals. To learn more, please visit .
Dexian is an Equal Opportunity Employer that recruits and hires qualified candidates without regard to race, religion, sex, sexual orientation, gender identity, age, national origin, ancestry, citizenship, disability, or veteran status.