Role : Enterprise Observability & AIOps Architect (App + Infra) - 19531
Location: Dallas, Texas, USA (Hybrid / Remote flexibility – Dallas preferred)
Designation: Principal Architect
Experience: 15+ Years (open to highly experienced profiles up to 25 years)
Duration: 1 Year (with possible extension)
Role Overview
We are looking for an experienced Enterprise Observability & AIOps Architect to design, modernize, and lead enterprise-scale observability ecosystems spanning applications, infrastructure, cloud platforms, databases, and operational workflows.
The ideal candidate will combine strategic architectural leadership with strong hands-on expertise in modern observability and AIOps platforms, driving operational excellence and AI-driven transformation across large enterprise environments.
Key Responsibilities
Enterprise Observability Architecture
• Lead enterprise-wide observability assessments across applications, infrastructure, cloud, and databases
• Define current-state and target-state architectures
• Drive monitoring rationalization and tool consolidation strategies
• Establish standards for telemetry, tagging, service identity, alerting, and dashboards
• Define scalable operating models aligned with SRE, ITSM, and platform engineering
Application Observability
• Architect solutions for:
o APM, distributed tracing, logs & metrics, RUM, synthetic monitoring
• Define SLI/SLO-driven monitoring strategies
• Improve service visibility, dependency mapping, and telemetry quality
• Build observability for microservices, APIs, Kubernetes, Azure-native & legacy systems
Infrastructure & Platform Observability
• Design observability across cloud, middleware, databases, and batch systems
• Analyze alert duplication, routing inefficiencies, and monitoring overlaps
• Define event correlation, severity models, enrichment, and ownership frameworks
AIOps & Intelligent Operations
• Design and implement:
o Event correlation & noise reduction
o Intelligent alert prioritization
o Anomaly detection & predictive insights
o Root cause analysis & contextualization
• Enable AI-driven workflows for:
o Incident reduction
o MTTR optimization
o Automated remediation
ITSM & Operational Integration
• Integrate observability tools with ServiceNow, CMDB, and incident workflows
• Define monitoring-to-incident processes and governance frameworks
• Establish KPI-driven operational maturity models
Governance & Blueprinting
• Develop enterprise standards, onboarding blueprints, and playbooks
• Define reusable observability patterns and reference architectures
• Establish Day-1 observability models for new services
Required Experience
• 15+ years in observability, SRE, platform engineering, AIOps, or production operations
• Proven experience in enterprise observability transformation and monitoring rationalization
• Strong background in hybrid cloud and distributed systems
• Experience working with executives, enterprise architects, and platform teams
• Deep understanding of incident management and reliability engineering
Technical Expertise
Observability Tools (Must-Have)
• Dynatrace
• Azure Monitor
• Azure Application Insights
• Azure Log Analytics
• LogicMonitor
• ManageEngine
Preferred Tools
• Splunk, ELK / OpenSearch
• Prometheus / Grafana
• Datadog, New Relic
• BigPanda, PagerDuty
Core Skills
• Event correlation & alert engineering
• Distributed tracing & topology mapping
• AIOps & intelligent operations
• Cloud telemetry & monitoring
• Kubernetes & microservices observability
• ITSM (ServiceNow) integration
• SRE principles & operational governance
Cloud & Platform
• Azure, AWS
• Kubernetes & container platforms
• APIs & integrations
• Middleware & distributed systems
Mandatory Skills
• Enterprise Observability Architecture
• OpenTelemetry framework design
• APM & cloud monitoring expertise
• ITSM integration & event correlation
• AIOps & anomaly detection
• Kubernetes & microservices monitoring
• Alert optimization & noise reduction
• SLI/SLO framework design
• Integration architecture & governance