Apply Now

Lead SRE & AI Ops Engineer

East Hartford, CT, US • Posted 2 hours ago • Updated 2 hours ago

Contract Corp To Corp

Contract Independent

75% Travel Required

On-site

60+

Fitment

Dice Job Match Score™

🛠️ Calibrating flux capacitors...

Job Details

Skills

API
Artificial Intelligence
Big Data
Bridging
Call Center
Cloud Computing
Communication
Continuous Delivery
Continuous Integration
Crisis Management
Customer Experience
DevOps
Dynatrace
GCS
Generative Artificial Intelligence (AI)
GitHub
Good Clinical Practice
Google Cloud Platform
Health Insurance
IT Operations
Incident Management
LangChain
Leadership
Machine Learning (ML)
Management
Media
Microservices
Production Support
Public Relations
Python
RESTful
Real-time
Redis
Root Cause Analysis
SAFE
Splunk
Technical Support
Telephony
Vertex

Summary

Role OverviewWe are seeking a Lead SRE & AI Ops Engineer with 12+ years of total experience, including at least 5 years in a leadership capacity, to oversee the reliability and performance of our AI-powered medical insurance call center platform. This role represents a strategic shift from traditional Big Data to AIOps, leveraging AI to process large volumes of telemetry data for proactive system support.You will lead the production support strategy for a complex ecosystem involving Google Contact Center AI (CCAI), Generative AI, and several Cloud Run microservices. As the "middle man" between Client stakeholders, DevOps, and Development teams, you will be responsible for maintaining the stability of a real-time speech-to-text and AI-driven "Advocate Assist" application. Key Responsibilities1. High-Volume Incident LeadershipIncident Commander: Act as the lead orchestrator for high-volume P1 and P2 incidents, managing the full lifecycle from detection to resolution.Crisis Management: Direct cross-functional teams during large-scale outages, ensuring clear communication with stakeholders and driving technical teams toward rapid service restoration.RCA & Problem Records: Own the Root Cause Analysis (RCA) process, creating and driving Problem Records (PR) to closure to ensure permanent remediation of recurring issues.2. AI Ops & GenAI Ecosystem SupportCCAI & Telephony Flow: Manage the reliability of the end-to-end flow from Media Hub through Google Telephony, Speech-to-Text conversion, and Dialogflow CX.GenAI Pipeline Maintenance: Optimize and troubleshoot LLM-powered services built on Vertex AI, Gemini, and LangChain, ensuring low-latency answers for call center advocates.AIOps Implementation: Shift from reactive monitoring to AI-driven operations, using machine learning to correlate signals across large datasets and predict failures before they impact users.3. Monitoring, Observability & TraceabilityMulti-Stack Visibility: Oversee a sophisticated monitoring suite including Datadog, Dynatrace, Splunk (for logging), and Google Cloud Platform Observability.Traceability Engineering: Implement and maintain end-to-end tracing to pinpoint latency and failure points across asynchronous Pub/Sub messages and several Cloud Run microservices.Proactive Health Checks: Use BigQuery and Splunk logs to establish performance baselines and automate anomaly detection.4. Integration & CI/CD LeadershipOperational Liaison: Serve as the technical point of contact for the client, bridging the gap between business needs and technical DevOps/Developer execution.Automated Lifecycles: Oversee CI/CD pipelines via GitHub Actions, ensuring that releases for AI prompts, knowledge bases, and python-based FastAPI services are stable and safe.Required Technical ExperienceExperience Level: 10+ years in IT/Operations with 5+ years in SRE leadership or Incident Management.Google Cloud Platform Infrastructure: Deep expertise in Cloud Run, Pub/Sub, GCS, BigQuery, and Redis.AI/ML Stack: Strong knowledge of Google CCAI, Vertex AI, Gemini, and LangChain.Backend & API: Proficient in Python (FastAPI) and RESTful API troubleshooting.Observability Tools: Expert-level knowledge of Splunk, Datadog, and Dynatrace.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 91099306
Position Id: 8957396
Posted 2 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Hybrid in East Hartford, Connecticut

•

Today

Role: Senior AI Ops Engineer / Gen AI EngineerLocation: Hartford, CT or Minneapolis, MN (Hybrid)Required Skills:Experience withGoogle Cloud Platform (Google Cloud Platform) and Contact Center AI (CCAI).Deploy and manage Google Cloud CCAI solutions including Dialogflow CX virtual agentsand Agent Assist.Experienced inGenAI, LLM Operations & Conversational AI Operations.Hands-on experience withDialogflow CX.Experience working with Generative AI / LLM applications.Proficiency in Python or Node.js.Ex

Easy Apply

Contract, Third Party

Depends on Experience

GenAI Engineer & Google Cloud CCAI

Hartford, Connecticut

•

2d ago

Role Summary We are looking for a GenAI & AI Ops Engineer to build, operate, and optimize AI-powered customer engagement solutions using Google Cloud Contact Center AI (CCAI) and Generative AI. You will ensure reliability, scalability, and performance of conversational AI across voice and digital channels while collaborating with AI, DevOps, and contact center teams. Key Responsibilities Deploy and manage Google CCAI solutions including Dialogflow CX virtual agents and Agent Assist. Monitor an

Easy Apply

Contract, Third Party

Senior DevOps Engineer at Kirkland, WA Harford CT & Nashville TN (Onsite)-Need only local candidates

East Hartford, Connecticut

•

30+d ago

Title : Senior DevOps Engineer Location : Kirkland, WA Harford CT & Nashville TN Onsite : Hybrid Need only local candidates Contract Note : 2 Level Internal interview followed by client interview Job Description : Support core cloud infrastructure systems: Network, Servers, Active Directory, Linux/Windows Server OS, Web/API Servers, Mongo DB, and similar systems. Design code-based solutions to automate manual tasks, routine DB operations, business processes, and to sup

Easy Apply

Contract, Third Party

$65 - $70

AI/ML Cloud Engineer - W2 - Bloomfield, CT - Hybrid - Long term Contract

Hybrid in Bloomfield, Connecticut

•

Today

AI/ML Cloud Engineer Location: Bloomfield, CT Type: Hybrid - 3 days onsite Duration: Long term Contract - W2 Key Responsibilities : Cloud Infrastructure Management Design, deploy, and manage cloud infrastructure supporting AI/ML workloads on AWS and Azure. Manage compute resources such as EC2, Azure Virtual Machines, GPU instances, EKS, VPC, ECS, S3, Lambda, Route 53 and Kubernetes clusters. Provision and configure storage, networking, and security services for AI platforms. Ensure high availabi

Easy Apply

Contract

50 - 60

Search all similar jobs

Lead SRE & AI Ops Engineer

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs