Moogsoft Enterprise Observability & AIOps-Tools Architect

Overview

On Site
60 - 65
Full Time
No Travel Required
Unable to Provide Sponsorship

Skills

Moogsoft

Job Details

Architect and hands-on implement a unified observability platform covering on-prem, multi-cloud (Azure + AWS), containers, and SaaS applications.
Own the full Moogsoft AIOps deployment: ingestion pipelines, situation clustering, noise reduction, automated remediation workflows, and integration with incident/response tools.
Design, build, and maintain enterprise “single-pane-of-glass” dashboards (executive, NOC/SOC, service-owner, and engineering views) in tools such as Grafana, Datadog, Dynatrace, New Relic, or Lightstep.
Lead deep bi-directional integration between observability tools and ServiceNow (Event Management, CMDB, Incident, Change, ITSM workflows, Service Mapping, Discovery).
Drive event correlation, alerting rationalization, and elimination of alert fatigue using Moogsoft and supporting tools.
Hands-on build and maintain data ingestion pipelines (metrics, events, logs, traces) using Prometheus, OpenTelemetry, Fluent Bit/Fluentd, Elastic, Splunk, Datadog agents, etc.
Create and present observability maturity roadmaps, AIOps business cases, SLA/SLO reporting, and tool rationalization plans to C-level executives and the board — in-person in San Jose.
Own licensing strategy, cost governance (FinOps for observability), and vendor relationships across the entire stack.
Mentor observability engineers and act as the final escalation owner for major incidents and platform issues.
 
Required Experience & Skills
10+ years in enterprise IT operations with 6+ years owning large-scale observability and AIOps platforms (5,000+ servers, 50,000+ containers, multi-region).
Deep, hands-on expertise with Moogsoft AIOps (recent versions) – you have built or rebuilt Moogsoft environments from scratch, tuned clustering algorithms, and delivered >80% noise reduction.
Proven track record building and operating enterprise dashboards that are used daily by executives, NOC/SOC, and engineering teams.
Expert-level ServiceNow integration experience:
Event Management (event rules, alert grouping, MID servers)
Bi-directional incident sync with Moogsoft or other tools
CMDB population via Discovery and Service Mapping
Custom ServiceNow dashboards and Performance Analytics
Broad modern observability stack experience (at least four of the following required):
Metrics & Dashboards: Grafana (advanced), Datadog, Dynatrace, New Relic, Lightstep
Logs & Tracing: Elastic Stack, Splunk, Loki, OpenTelemetry
Cloud-native: Azure Monitor, CloudWatch, Google Operations
AIOps: Moogsoft (mandatory), BigPanda, PagerDuty SignalFlow
Strong scripting/automation: Python (mandatory), Go, PowerShell, Ansible.
Comfortable and effective presenting in-person to senior leadership and war-room teams in San Jose HQ on a daily basis.
 
Certifications (at least two required)
Moogsoft Certified Engineer or Architect
ServiceNow Certified Implementation Specialist – Event Management
Datadog Certified Architect, Dynatrace Associate/Pro, Grafana TCO, etc.
ITIL v4 Foundation or higher
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About CogniSoft Technologies