Ideal Candidate Profile
The ideal candidate is a Lead AIOps Engineer with deep Azure expertise and a strong track record of implementing AI-powered operational solutions.
They should have hands-on experience building intelligent monitoring, alerting, automation, chatbot, and LLM-based platforms within Azure ecosystems, while also possessing enough SRE and operational knowledge to improve reliability through AI-driven approaches.
This role is best suited for someone who can bridge cloud operations, AI/ML technologies, and enterprise-scale automation initiatives.
Position Summary
We are seeking a highly skilled Lead AIOps Engineer to drive the design, implementation, and optimization of AI-powered operational solutions within a fully Azure-based cloud environment. This role is ideal for a professional who combines strong Azure Cloud expertise with hands-on experience in AIOps, AI/ML technologies, intelligent automation, and modern observability platforms.
The successful candidate will lead initiatives focused on leveraging AI to enhance monitoring, alert management, incident response, operational efficiency, and predictive analytics. While some Site Reliability Engineering (SRE) experience is valuable, this position is primarily focused on AI-driven operations and automation rather than traditional SRE responsibilities.
Required Qualifications
- Bachelor''s or Master''s degree in Computer Science, Engineering, Information Technology, or a related field.
- 8+ years of experience in Cloud Operations, Platform Engineering, AIOps, DevOps, or related disciplines.
- Strong hands-on experience with Microsoft Azure Cloud (mandatory).
- Proven experience implementing and managing AIOps solutions in enterprise Azure environments.
- Experience with monitoring, observability, alert management, event correlation, and operational analytics.
- Hands-on experience with:
- Large Language Models (LLMs)
- ChatGPT/OpenAI or Azure OpenAI Services
- Retrieval-Augmented Generation (RAG)
- LangChain or similar AI frameworks
- AI chatbot development, configuration, and deployment
- AI-driven automation and operational workflows
- Experience with AI/MLOps practices, model deployment, and lifecycle management.
- Strong scripting and automation skills using Python, PowerShell, or similar languages.
- Familiarity with CI/CD pipelines and Infrastructure as Code (IaC).