AIOps Engineer (Senior Manager)

Overview

Hybrid
$130,000 - $140,000
Full Time

Skills

Aiops

Job Details

Job Title: AIOps Engineer (Senior Manager)

Location: NYC, NY

Job Type: Full Time
Experience Level: 15+ Years over all , 8-12 years in IT operations or SRE, with 5+ years in AIOps or observability engine

Role Overview:

We are looking for a proactive and analytical AIOps Engineer to design, implement, and optimize AI-driven operational intelligence solutions. This role will focus on leveraging machine learning, event correlation, and automation to enhance observability, reduce incident noise, and improve system reliability across cloud and hybrid environments.

Key Responsibilities:

  • Implement and manage AIOps platformsto enable intelligent alerting, anomaly detection, and root cause analysis.
  • Integrate AIOps capabilities with observability tools (e.g., New Relic) and incident management systems (e.g., PagerDuty).
  • Develop event correlation rules, noise reduction strategies, and predictive analytics to support proactive operations.
  • Collaborate with SRE, Cloud, and Application teams to embed AIOps into CI/CD and production workflows.
  • Automate operational tasks and remediation workflows using scripting and orchestration tools.
  • Monitor and fine-tune AIOps models to ensure accuracy, relevance, and performance.
  • Contribute to the observability and RunOps strategy, driving continuous improvement through data-driven insights.

Required Skills & Experience:

  • Experience with AIOps platforms(e.g., Moogsoft, BigPanda, Dynatrace, Splunk ITSI, or similar).
  • Strong understanding of observability concepts(metrics, logs, traces) and tools like New Relic, Datadog, or Prometheus.
  • Proficiency in scripting and automation(Python, Bash, PowerShell).
  • Experience with cloud platforms like AWS (preferred), Azure, or Google Cloud Platform and DevOps/SRE practices.
  • Experience with event correlation, anomaly detection, and ML-based alerting.
  • Working knowledge of IDP platforms and developer enablement tools.
  • Practical experience using GitHub Copilot for automation and code generation.
  • Strong analytical and problem-solving skills.

Preferred Qualifications:

  • Exposure to machine learning modelsfor operational analytics.
  • Experience with ITSM tools(e.g., ServiceNow) and incident response workflows.
  • Certifications in cloud platforms or AIOps tools.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.