Overview
On Site
Depends on Experience
Accepts corp to corp applications
Contract - Independent
Contract - W2
Contract - 6 month(s)
No Travel Required
Skills
dynatrace
CLOUD
Infrastructure
CI/CD
New Relic
Datadog
Moogsoft
BigPanda
SRE
Job Details
Position: CloudOps AIOPs Engineer
Location: Jersey City, NJ
Duration: 6 Months
Job Title: AIOps Engineer
We are seeking a proactive and analytical AIOps Engineer to lead the design, implementation, and optimization of AI-driven operational intelligence solutions. This role will focus on leveraging machine learning, intelligent event correlation, and automation to enhance observability, reduce alert fatigue, and increase system reliability across cloud and hybrid environments.
Key Responsibilities
- Design, implement, and manage AIOps platforms to enable intelligent alerting, anomaly detection, root cause analysis, and automated remediation.
- Integrate AIOps capabilities with observability tools (e.g., New Relic, Datadog) and incident management platforms (e.g., PagerDuty, ServiceNow).
- Develop event correlation rules, noise reduction strategies, and predictive analytics models to drive proactive operations.
- Collaborate closely with Site Reliability Engineering (SRE), Cloud Infrastructure, and Application Development teams to embed AIOps into CI/CD and production workflows.
- Build and maintain automated remediation scripts and workflows using orchestration tools and scripting languages.
- Continuously monitor and tune AIOps models to improve accuracy, reduce false positives, and deliver actionable insights.
- Contribute to the overall observability strategy, supporting RunOps and operational excellence through data-driven insights.
Required Skills & Experience
- Proven experience with AIOps platforms (e.g., Moogsoft, BigPanda, Dynatrace, Splunk ITSI, or similar).
- Strong understanding of observability concepts (metrics, logs, traces) and hands-on experience with tools like New Relic, Datadog, or Prometheus.
- Proficient in scripting and automation using Python, Bash, or PowerShell.
- Solid background in cloud platforms, preferably AWS, with working knowledge of Azure or Google Cloud Platform.
- Familiarity with DevOps and SRE practices, including CI/CD and infrastructure-as-code.
- Hands-on experience with event correlation, anomaly detection, and ML-based alerting systems.
- Experience working with IDP platforms and developer enablement tools.
- Practical knowledge of using GitHub Copilot or similar AI tools to support automation and code generation.
- Strong analytical, problem-solving, and collaborative skills.
Preferred Qualifications
- Exposure to machine learning models for operational analytics and time-series anomaly detection.
- Experience integrating with ITSM tools (e.g., ServiceNow) and incident response workflows.
- Relevant cloud certifications (AWS, Azure, Google Cloud Platform) or AIOps tool certifications.
- Familiarity with runbook automation, workflow orchestration, and self-healing systems.
Best Regards,
Vishal
Truth Lies in Heart
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.