Overview
On Site
Depends on Experience
Contract - Independent
Contract - W2
Contract - 12 Month(s)
Skills
python
pytorch
tensorflow
machine learning
Agentic AI
API
automation
Job Details
Remote is fine. Prefer candidates who are in CA region but can work remotely.
Job Description
- Design and implement AI Agents to optimize cloud resource allocation, auto-scaling, and performance tuning.
- Develop predictive models for failure detection, incident management, and system health monitoring.
- Automate operational workflows using machine learning and intelligent scripting.
- Integrate AI-driven insights with existing cloud monitoring tools.
- Collaborate with DevOps and SRE teams to deploy, monitor, and improve ML models in production environments.
- Conduct anomaly detection for security, cost optimization, and performance analytics.
- Continuously evaluate emerging AI technologies and tools for operational improvements.
- Maintain documentation and best practices for AI/ML integration in cloud systems.
Our Minimum Requirements include:
- Bachelor's or equivalent experience or master s degree in computer science, Data Science, or related technical field.
- Proven ability building and deploying ML models, with at least 2 years focused on infrastructure or cloud operations.
- Solid knowledge of hybrid cloud technologies (AWS, Google Cloud Platform, OpenStack, Kubernetes).
- Experience with Python, Jupiter, and ML libraries such as PyTorch, TensorFlow, or scikit-learn.
- Familiarity with cloud-native monitoring, logging, and automation tools (e.g., Terraform, Ansible, Prometheus, Splunk, AppDynamics).
- Comfortable working with streaming data, APIs, and telemetry systems.
- Strong communication and multi-functional collaboration skills.
- Experience with Agile and DevOps operating models, including project tracking tools (e.g., Jira), Git (any Version Control systems), and CI/CD systems (e.g., GitLab, GitHub Actions, Jenkins).
- Proficient in general-purpose programming languages (Python, GoLang, Bash and/or C/C++) and development platforms and technologies.
Preferred Qualifications
- Deep understanding of operating systems and experience with Cisco technologies (UCS, Nexus, Thousand Eyes)
- Established record of leading technical initiatives, delivering results, and a commitment to fostering a supportive work environment.
- Hard-working, dedicated to providing quality support for your customers
Mandatory Skills for this role |
| |
Python (such as PyTorch, TensorFlow, or scikit-learn) | ||
Predictive ML Modelling (preferred - infrastructure or cloud operations domain) | ||
Agentic AI, MCP Integration | ||
Knowledge of hybrid cloud | ||
Experience working with streaming data, APIs, and telemetry systems for data from cloud-native monitoring, logging, and automation tools (e.g., Terraform, Ansible, Prometheus, Splunk) | ||
Agile, Jira, CI/CD |
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.