Enterprise Observability & AIOps Architect

Dallas, TX, US • Posted 1 day ago • Updated 1 day ago
Contract Independent
Contract W2
2 Years
75% Travel Required
Able to Sponsor
On-site
Depends on Experience
Company Branding Image
Fitment

Dice Job Match Score™

✨ Finding the perfect fit...

Job Details

Skills

  • Observability
  • AIOps
  • Infrastructure
  • Platform
  • microservices
  • API
  • Kubernetes
  • Azure-native
  • cloud
  • middleware
  • databases
  • Incident reduction
  • MTTR
  • optimization
  • ITSM
  • ServiceNow
  • CMDB
  • Dynatrace
  • Azure Monitor
  • Azure Application Insights
  • Azure Log Analytics
  • LogicMonitor
  • ManageEngine
  • OpenTelemetry
  • AI
  • AI Operations

Summary

Job Title: Enterprise Observability & AIOps Architect (App + Infra)
Location: Dallas, Texas, USA (Hybrid - Dallas preferred)
Experience: 15+ Years (open to highly experienced profiles up to 25 years)
Duration: 1 Year (with possible extension)

Role Overview
We are looking for an experienced Enterprise Observability & AIOps Architect to design, modernize, and lead enterprise-scale observability ecosystems spanning applications, infrastructure, cloud platforms, databases, and operational workflows.
The ideal candidate will combine strategic architectural leadership with strong hands-on expertise in modern observability and AIOps platforms, driving operational excellence and AI-driven transformation across large enterprise environments.

Key Responsibilities
Enterprise Observability Architecture
Lead enterprise-wide observability assessments across applications, infrastructure, cloud, and databases
Define current-state and target-state architectures
Drive monitoring rationalization and tool consolidation strategies
Establish standards for telemetry, tagging, service identity, alerting, and dashboards
Define scalable operating models aligned with SRE, ITSM, and platform engineering

Application Observability
Architect solutions for:
APM, distributed tracing, logs & metrics, RUM, synthetic monitoring
Define SLI/SLO-driven monitoring strategies
Improve service visibility, dependency mapping, and telemetry quality
Build observability for microservices, APIs, Kubernetes, Azure-native & legacy systems

Infrastructure & Platform Observability
Design observability across cloud, middleware, databases, and batch systems
Analyze alert duplication, routing inefficiencies, and monitoring overlaps
Define event correlation, severity models, enrichment, and ownership frameworks

AIOps & Intelligent Operations
Design and implement:
Event correlation & noise reduction
Intelligent alert prioritization
Anomaly detection & predictive insights
Root cause analysis & contextualization
Enable AI-driven workflows for:
Incident reduction
MTTR optimization
Automated remediation

ITSM & Operational Integration
Integrate observability tools with ServiceNow, CMDB, and incident workflows
Define monitoring-to-incident processes and governance frameworks
Establish KPI-driven operational maturity models

Governance & Blueprinting
Develop enterprise standards, onboarding blueprints, and playbooks
Define reusable observability patterns and reference architectures
Establish Day-1 observability models for new services

Required Experience
15+ years in observability, SRE, platform engineering, AIOps, or production operations
Proven experience in enterprise observability transformation and monitoring rationalization
Strong background in hybrid cloud and distributed systems
Experience working with executives, enterprise architects, and platform teams
Deep understanding of incident management and reliability engineering

Technical Expertise
Observability Tools (Must-Have)
Dynatrace
Azure Monitor
Azure Application Insights
Azure Log Analytics
LogicMonitor
ManageEngine

Preferred Tools
Splunk, ELK / OpenSearch
Prometheus / Grafana
Datadog, New Relic
BigPanda, PagerDuty

Core Skills
Event correlation & alert engineering
Distributed tracing & topology mapping
AIOps & intelligent operations
Cloud telemetry & monitoring
Kubernetes & microservices observability
ITSM (ServiceNow) integration
SRE principles & operational governance

Cloud & Platform
Azure, AWS
Kubernetes & container platforms
APIs & integrations
Middleware & distributed systems

Mandatory Skills
Enterprise Observability Architecture
OpenTelemetry framework design
APM & cloud monitoring expertise
ITSM integration & event correlation
AIOps & anomaly detection
Kubernetes & microservices monitoring
Alert optimization & noise reduction
SLI/SLO framework design
Integration architecture & governance

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 91170837
  • Position Id: 8975170
  • Posted 1 day ago

Company Info

About TechVirtue LLC

TechVirtue is involved in developing a wide range of solutions in finding the perfect candidate who has a strong knowledge in his/her work and suits the company's work culture. We even provide one-stop solutions ranging from software development and maintenance to expert support and advisory. Our team consists of experts who have several years of experience in staffing, recruitment, and web development. Our dedicated and motivated team makes sure to fulfill all our customers requirements.

About_Company_OneAbout_Company_Two
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote

Yesterday

Easy Apply

Contract

65 - 75

Remote

2d ago

Easy Apply

Contract

Depends on Experience

Search all similar jobs