AI/ML Engineer Cloud Operations & Predictive Modeling

  • San Jose, CA
  • Posted 3 hours ago | Updated 3 hours ago

Overview

On Site
Depends on Experience
Full Time
100% Travel

Skills

Agile
Amazon Web Services
Ansible
AppDynamics
Artificial Intelligence
Cisco
Cisco UCS
Cloud Computing
Collaboration
Communication
Computer Science
Continuous Delivery
Continuous Integration
DS
DevOps
DirectShow
Documentation
GitHub
Good Clinical Practice
Google Cloud Platform
Operational Efficiency
Machine Learning (ML)
Kubernetes
Microsoft Certified Professional
PyTorch
Splunk
Resource Allocation
Python
Workflow

Job Details

Title: AI/ML Engineer Cloud Operations & Predictive Modeling

Location: San Jose, CA
Duration: Full-time
Visa: All (Except CPT & OPT)
Start Date: Immediate to 1 week

Experience: 10-12 Years

Role Overview

We are seeking an AI/ML Engineer with strong expertise in building AI Agents, predictive modeling, and cloud-based automation. The role involves developing scalable ML solutions, automating cloud operations, and collaborating with DevOps/SRE teams to drive performance optimization and reliability.

Responsibilities:

  • Design & implement AI Agents for cloud resource allocation, auto-scaling, and performance tuning.
  • Develop predictive ML models for system health monitoring, incident management, and failure detection.
  • Automate operational workflows with intelligent scripting.
  • Integrate AI-driven insights into existing cloud monitoring & DevOps tools.
  • Conduct anomaly detection for security, cost, and performance optimization.
  • Evaluate emerging AI technologies to enhance operational efficiency.
  • Collaborate with cross-functional teams and maintain best practices documentation.

Minimum Qualifications:

  • Bachelor s or Master s in AI/DS/Computer Science (with AI specialization).
  • 2+ years of experience building and deploying ML models (preferably in infrastructure/cloud ops).
  • Strong knowledge of AWS, Google Cloud Platform, OpenStack, Kubernetes.
  • Expertise in Python, Jupyter, PyTorch, TensorFlow, scikit-learn.
  • Familiar with Terraform, Ansible, Prometheus, Splunk, AppDynamics.
  • Hands-on with streaming data, APIs, telemetry systems.
  • Agile/DevOps exposure with Jira, Git, CI/CD (GitLab, Jenkins, GitHub Actions).
  • Strong communication and collaboration skills.

Preferred Qualifications:

  • Knowledge of Cisco technologies (UCS, Nexus, Thousand Eyes).
  • Strong OS-level knowledge & cloud-native architectures.
  • Proven track record in leading technical initiatives.

Mandatory Skills:

  • Python (with ML frameworks: PyTorch, TensorFlow, scikit-learn).
  • Predictive ML Modeling (preferably in cloud/infrastructure).
  • Agentic AI, MCP Integration.
  • Hybrid Cloud Expertise.
  • Experience with streaming data & cloud-native monitoring/logging tools.
  • Agile/CI/CD practices (Jira, Jenkins, GitHub/GitLab).

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Tek9 Inc