Overview
Skills
Job Details
Title CloudOps AIOPs Engineer
Location New York ( Hybrid 2-3 days onsite)
Duration: 12 months contract
Skill List
AIOps, artificial intelligence, computer science, science & research
Role Overview:
We are looking for a proactive and analytical AIOps Engineer to design, implement, and optimize AI-driven operational intelligence solutions. This role will focus on leveraging machine learning, event correlation, and automation to enhance observability, reduce incident noise, and improve system reliability across cloud and hybrid environments.
______________________________
Key Responsibilities:
Implement and manage AIOps platforms to enable intelligent alerting, anomaly detection, and root cause analysis.
Integrate AIOps capabilities with observability tools (e.g., New Relic) and incident management systems (e.g., PagerDuty).
Develop event correlation rules, noise reduction strategies, and predictive analytics to support proactive operations.
Collaborate with SRE, Cloud, and Application teams to embed AIOps into CI/CD and production workflows.
Automate operational tasks and remediation workflows using scripting and orchestration tools.
Monitor and fine-tune AIOps models to ensure accuracy, relevance, and performance.
Contribute to the observability and RunOps strategy, driving continuous improvement through data-driven insights.
______________________________
Required Skills & Experience:
Experience with AIOps platforms (e.g., Moogsoft, BigPanda, Dynatrace, Splunk ITSI, or similar).
Strong understanding of observability concepts (metrics, logs, traces) and tools like New Relic, Datadog, or Prometheus.
Proficiency in scripting and automation (Python, Bash, PowerShell).
Experience with cloud platforms like AWS (preferred), Azure, or Google Cloud Platform and DevOps/SRE practices.
Experience with event correlation, anomaly detection, and ML-based alerting.
Working knowledge of IDP platforms and developer enablement tools.
Practical experience using GitHub Copilot for automation and code generation.
Strong analytical and problem-solving skills.
______________________________
Preferred Qualifications:
Exposure to machine learning models for operational analytics.
Experience with ITSM tools (e.g., ServiceNow) and incident response workflows.
Certifications in cloud platforms or AIOps tools.