Role - Integration Observability Architect
Location: Remote (Dallas, Tx)
Duration: 12 Months
Experience: Above 15 Years
Skill Set
Enterprise Observability Architecture, Open Telemetry framework design, APM & Cloud monitoring platforms expertise, ITSM integration & event correlation, AIOps & anomaly detection, Kubernetes & microservices monitoring, Alert optimization & noise reduction, SLI/SLO framework definition, Integration architecture & governance standards
Job Summary
Data / Batch / Integration Observability Architect
Experience & Purpose:
Minimum 15+ years of experience in the data domain, with strong expertise in defining and implementing monitoring and observability frameworks for enterprise-scale data ecosystems. Responsible for establishing a scalable data observability strategy across pipelines, batch workloads, databases, and integration layers to ensure end-to-end visibility, reliability, operational resilience, and business-impact awareness.
Key Responsibilities
Assess observability across:
Batch jobs, schedulers, ETL/ELT pipelines, and data platforms
Database monitoring, performance, and query behavior
Integration and middleware workflows across systems
Evaluation:
Pipeline visibility (latency, failures, throughput, dependencies, data SLAs)
Effectiveness of schedulers/orchestration platforms (e.g., ActiveBatch, Airflow, Control-M)
Database observability and performance monitoring practices
Identify:
Blind spots in data flow, lineage, and cross-system dependencies
Failure detection gaps beyond job-level (data quality, freshness, volume anomalies)
Inefficiencies in retry mechanisms, alerting, and operational workflows
Define:
Standard observability patterns and frameworks for data workloads
Dependency-aware monitoring models across upstream and downstream systems
Actionable dashboards, alerts, and SLAs aligned to business impact
Repeatable onboarding patterns for new pipelines and data services
Enable intelligent observability:
Reduce alert noise and improve signal quality and actionability
Correlate events across pipelines, databases, and integrations
Link technical failures to business outcomes and downstream impact
Incorporate AI capabilities:
Anomaly detection in pipeline behavior, data patterns, and performance trends
Failure prediction and early warning signals for batch/data workflows
Intelligent alerting and correlation across data ecosystems leveraging AIOps platforms such SNOW ITOM, Moog soft or Big Panda
Contribute to:
Target-state data observability architecture and engineering blueprint
Retrofit and modernization guidance for existing pipelines and platforms
Integration with ITSM, incident management, and operational workflows
Technical Skills
Experience with (any of the following):
Schedulers / Orchestration: ActiveBatch, Airflow, Control-M, Autosys
Data Platforms: Azure Data Factory, Databricks, Snowflake, Hadoop ecosystem
Observability Tools: Azure Monitor, Log Analytics (KQL), Splunk, ELK, Dynatrace, Prometheus
Hands-on experience with:
ActiveBatch (job scheduling and monitoring)
SQL Sentry or similar tools (database observability)
Azure Log Analytics (KQL for data monitoring)
Azure Monitor (data-related metrics/logs)
Understanding of Data pipelines and integration patterns
Working knowledge of:
Data pipelines (ETL/ELT), batch processing, and integration patterns
Database systems and performance monitoring tools (e.g., SQL Sentry or equivalent)
Logs, metrics, and event correlation across distributed systems
Expectations / Success Criteria
Identify and eliminate critical data pipeline blind spots and failure gaps
Establish standard, reusable observability patterns for data workloads
Enable end-to-end visibility across upstream and downstream dependencies
Improve alert quality, reduce noise, and accelerate issue detection and resolution (MTTR)
Deliver a practical, implementable data observability blueprint
Drive adoption of proactive and AI-assisted monitoring practices across data ecosystems