Lead AIOps Engineer

Overview

Remote
Depends on Experience
Accepts corp to corp applications
Contract - W2
Contract - Independent
Contract - 12 Month(s)

Skills

AIOPs
MLOps
Python
API
LLM
Cloud
DevOps
TensorFlow
PyTorch
scikit-learn
Pandas.

Job Details

About GSPANN
Headquartered in Milpitas, California (U.S.A.), GSPANN provides consulting and IT services to global clients, ranging from mid-size to Fortune 500 companies. With our experience in retail, high-technology, and manufacturing, we help our clients to transform and deliver business value by optimizing their IT capabilities, practices, and operations. Counting on our ten offices, including four global delivery centers, and approximately 1400 employees globally, we offer the intimacy of a boutique consultancy with capabilities of a large IT services firm.

AI Operations (AI Ops) Engineer
Location-Fremont, CA / Remote
Job Type-Long Term

AI Ops Engineer with a strong background in Python, API development, Large Language Models (LLM) concepts, ML Ops, Azure Cloud and AI operations with 8-10 years of experience working on advanced AI/ML systems, cloud infrastructure, and API integrations, with a focus on operationalizing AI models and maintaining robust systems for AI-driven applications. This role requires a combination of technical expertise in cloud computing, machine learning, and software engineering. Collaborate with IT operations and business teams to support business user issues, requests, Production support and deployments; advocate best practices and recommend technical solutions for improvements in usability of application and systems performance

Key Responsibilities:

Technical Operations: Review, Implement and support enterprise-level AI platforms and services to drive IT operation excellence. Ensuring that new use cases are onboarded smoothly and operationalized
Optimization: Analyze business processes to identify areas for automation and work with business stakeholders and IT teams to determine requirements and design software bots to reduce operational toil.
AI Ops & Model Deployment: Lead the operationalization and deployment of AI/ML models into production environments, ensuring they are highly available, scalable, and performant. Implement and monitor Continuous Integration (CI) and Continuous Deployment (CD) pipelines.
Python Development: Design and develop Python-based solutions for automating and managing the lifecycle of AI/ML models, including data ingestion, model training, and real-time prediction workflows.
API Integration: Build and maintain robust APIs for model serving and integration with other systems. Ensure seamless communication between models, data pipelines, and consumer applications.
LLM Concepts and Implementation: Apply knowledge of Large Language Models (LLMs) to develop AI-driven applications and services, ensuring models are optimized and performing efficiently in production.
ML Ops: Implement and maintain Machine Learning Operations (ML Ops) practices for version control, monitoring, logging, and debugging of AI/ML models in production. Support model retraining, versioning, and A/B testing.
Cloud Infrastructure: Leverage Azure Cloud services for hosting and scaling AI applications, ensuring security, compliance, and performance. Implement infrastructure as code (IaC) using tools like Azure DevOps.
Collaboration: Work closely with backend engineers, data engineers/developers, infrastructure engineers , operational SMEs and business stakeholders to tackle evolving challenges in the field of AI/ML to ensure AI solutions meet business requirements and performance benchmarks.
Monitoring & Optimization: Continuously monitor the performance of deployed AI models and optimize them for efficiency, cost-effectiveness, and accuracy. Implement alerting and logging mechanisms by scripts or through observability solution.
Documentation & Best Practices: Document AI Ops processes, Use cases, tools, and workflows. Establish and enforce best practices for managing AI models in production environments.

Required Skills & Qualifications:

Experience: 8-10 years of experience in software development, with a focus on AI/ML operations, cloud infrastructure, and DevOps practices.
Python: Advanced proficiency in Python, including experience with AI/ML libraries such as TensorFlow, PyTorch, scikit-learn, and Pandas.
APIs: Strong experience in designing, developing, and maintaining RESTful APIs for AI/ML model deployment and integration.
ML Ops: In-depth understanding of Machine Learning Operations, including model versioning, monitoring, deployment, and automation of ML workflows.
LLM Concepts: Familiarity with Large Language Models (LLMs), including experience working with transformer-based models such as GPT, BERT, or T5.
Azure Cloud: Hands-on experience with Azure Cloud services (Azure ML, Azure DevOps, Azure Functions, etc.) and cloud infrastructure management.
DevOps & CI/CD: Proficient in setting up CI/CD pipelines for AI/ML models and using tools like Jenkins, GitLab, or Azure DevOps for automation.
Data Management & Tools: Experience working with data storage and processing tools like Azure Blob Storage, Azure SQL Database, Kafka, or similar.
Version Control: Expertise with Git and version control best practices for collaborative development of AI systems.
Problem Solving: Strong analytical and troubleshooting skills, with the ability to identify root causes and optimize AI/ML models and systems.
Communication & Collaboration: Excellent communication skills and the ability to work effectively in a cross-functional team environment.

Preferred Skills:

Cloud Certifications: Azure certifications such as Azure Solutions Architect, Azure AI Engineer, or Azure DevOps Engineer.
Security & Compliance: Understanding of security best practices in AI model deployment and experience with secure handling of sensitive data in the cloud.
Big Data Tools: Familiarity with big data processing frameworks (e.g., Apache Spark, Hadoop) and integration with AI/ML pipelines.
Agile Methodologies: Experience working in Agile teams, with knowledge of Scrum, Kanban, or similar frameworks.

Working at GSPANN GSPANN is a diverse, prosperous, and rewarding place to work. We provide competitive benefits, educational assistance, and career growth opportunities to our employees. Every employee is valued for their talent and contribution. Working with us will give you an opportunity to work globally with some of the best brands in the industry. The company does and will take an affirmative action to employ and advance in the employment of individuals with disabilities and protected veterans, and to treat qualified individuals without discrimination based on their physical or mental disability status. GSPANN is an equal opportunity employer for minorities/females/veterans/disability.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.