Enterprise Observability & AIOps Architect
(Application + Infrastructure + Platform + ITSM/AIOps)
Role Overview
We are seeking a highly experienced Enterprise Observability & AIOps Architect with 15+ years of experience in designing and modernizing enterprise-scale observability ecosystems across applications, infrastructure, cloud platforms, databases, integrations, and operational workflows.
The ideal candidate should possess strong expertise in:
· AIOps & Event Correlation
· ITSM Integration
· Telemetry Governance
· SRE & Operational Excellence
· Enterprise Monitoring Rationalization
· AI-driven Operational Transformation
This role requires both strategic architecture leadership and strong hands-on expertise across modern observability and AIOps platforms in large enterprise environments.
Key Responsibilities
Enterprise Observability Architecture
· Lead enterprise-wide observability assessments across applications, infrastructure, cloud, databases, and operational workflows.
· Define current-state and target-state observability architecture.
· Develop monitoring rationalization and consolidation strategies across enterprise toolsets.
· Establish standards for telemetry, tagging, service identity, alerting, dashboards, and governance.
· Define scalable operating models aligned to SRE, ITSM, and platform engineering practices.
Application Observability
· Architect observability solutions across:
APM | Distributed tracing | Logs & metrics | RUM & synthetics
· Define SLI/SLO-driven monitoring and alerting strategies.
· Improve service dependency visibility, transaction tracing, and telemetry quality.
· Design monitoring patterns for microservices, APIs, Kubernetes, Azure-native, and legacy applications.
Infrastructure & Platform Observability
· Design observability solutions for cloud infrastructure, middleware, databases, platform services, and batch ecosystems.
· Assess alert quality, duplication, routing inefficiencies, and monitoring overlaps.
· Define event correlation, severity models, enrichment standards, and operational ownership structures.
AIOps & Intelligent Operations
· Design AIOps capabilities including:
o Event correlation
o Noise reduction
o Intelligent alert prioritization
o Anomaly detection
o Predictive insights
o Root-cause contextualization
· Define AI-assisted operational workflows for incident reduction, MTTR optimization, and automated remediation.
ITSM & Operational Integration
· Integrate observability platforms with ServiceNow, incident workflows, CMDB, and collaboration tools.
· Define monitoring-to-incident operational workflows and governance standards.
· Establish KPI-driven operational maturity frameworks.
Governance & Blueprinting
· Develop enterprise standards, onboarding blueprints, engineering playbooks, and reusable observability patterns.
· Create reference architectures, dashboard standards, and operational governance frameworks.
· Define “Day-1 Observability” onboarding models for new services.
Required Experience
· 15+ years of experience in observability, infrastructure, SRE, production operations, platform engineering, or AIOps architecture.
· Strong experience in enterprise-scale hybrid cloud and distributed environments.
· Proven experience leading observability transformation and monitoring rationalization initiatives.
· Experience working with executive leadership, enterprise architects, platform teams, and operations organizations.
· Strong understanding of enterprise operational workflows, incident management, and reliability engineering.
Required Technical Expertise
Observability Platforms
Strong hands-on expertise in:
Dynatrace | Azure Monitor | Azure Application Insights | Azure Log Analytics | LogicMonitor | ManageEngine
Preferred:
Splunk | ELK/OpenSearch | PrometheGrafana | Datadog | New Relic | BigPanda | PagerDuty
Core Skills
· Event correlation & alert engineering
· Distributed tracing & topology mapping
· AIOps & intelligent operations
· Cloud monitoring & telemetry
· Kubernetes & microservices observability
· ITIL / ITSM integration
· SRE principles & operational governance
Cloud & Platform Experience
Azure | AWS | Kubernetes | APIs & integrations | Middleware & distributed systems
Preferred Qualifications
· Experience defining enterprise observability standards and governance models.
· Experience with operational transformation initiatives involving AI/AIOps.
· Strong workshop facilitation, stakeholder management, and executive presentation skills.
· Certifications in Cloud, Observability, ITIL, SRE, or AIOps preferred.
Success Criteria
· Establish a unified enterprise observability architecture.
· Reduce alert noise and operational inefficiencies.
· Improve telemetry quality, service visibility, and incident response.
· Enable scalable AIOps-driven operational workflows.
· Deliver standardized onboarding, governance, and engineering blueprints.
· Improve operational maturity, reliability, and service resilience.