Apply Now

AI/ML Engineer

Hybrid in Burlingame, CA, US • Posted 6 hours ago • Updated 6 hours ago

Full Time

No Travel Required

On-site

Depends on Experience

Fitment

Dice Job Match Score™

🤯 Applying directly to the forehead...

Job Details

Skills

Python
Machine Learning (ML)
Large Language Models (LLMs)
LangChain
AI

Summary

We are seeking an experienced AI/ML Engineer to build, scale, and maintain the critical infrastructure that powers our AI models and autonomous agents. In this role, you will act as the bridge between our AI research/development teams and our production environments. You will not just be deploying models; you will be designing the high-performance, distributed systems required to serve Large Language Models (LLMs), orchestrate multi-agent workflows, and optimize GPU compute at scale.

If you are passionate about turning complex AI capabilities into highly reliable, scalable, and cost-efficient production systems, this is the role for you.

Key Responsibilities

1. Machine Learning Infrastructure & Serving

Design, build, and manage scalable infrastructure for training, fine-tuning, and serving LLMs and multimodal models.
Optimize inference latency, throughput, and cost using modern serving frameworks (e.g., vLLM, Triton Inference Server, Ray Serve) [2].
Manage and orchestrate GPU/TPU clusters, ensuring high utilization and efficient resource allocation.

2. Building and Scaling Agentic Operations (AgentOps)

Architect and deploy infrastructure to support autonomous AI agents and multi-agent systems.
Integrate and maintain agent orchestration frameworks (e.g., LangGraph, CrewAI) within production environments [3].
Build robust state management and memory systems (vector databases, graph databases) required for agentic workflows.

3. Observability, Evaluation, and Reliability

Implement comprehensive observability stacks tailored for LLMs and agents (tracing, prompt logging, cost tracking) using tools like Langfuse, Arize, or Datadog [4].
Design automated evaluation pipelines to monitor agent performance, safety, and reliability in real-time (LLMOps/AgentOps).
Act as the first line of defense for production AI systems, diagnosing and resolving issues related to memory limits, inference queues, and cluster failures.

4. Developer Platform & CI/CD for AI

Build internal developer platforms and tooling that allow AI engineers and data scientists to easily deploy models and agents to production.
Adapt traditional CI/CD pipelines to accommodate model versioning, prompt management, and continuous evaluation.

Qualifications

Required Skills:

Systems Engineering: Strong background in distributed systems, backend engineering, or DevOps/SRE.
Programming: Proficiency in Python (essential for the AI ecosystem) and systems languages like Go or Rust.
Containerization & Orchestration: Deep expertise in Kubernetes (K8s), Docker, and infrastructure-as-code (Terraform, Pulumi).
AI/ML Tooling: Hands-on experience with LLM serving engines (vLLM, TGI, Triton) and distributed computing frameworks (Ray) [2].
Agent Frameworks: Familiarity with modern agentic development frameworks like LangChain, LangGraph, or CrewAI [3].
Cloud & Hardware: Experience managing high-performance compute (GPTPUs) on major cloud providers (AWS, Google Cloud Platform, Azure)

Preferred Skills:

Experience with vector databases (Pinecone, Milvus, Qdrant) and retrieval-augmented generation (RAG) pipelines.
Understanding of model optimization techniques (quantization, LoRA, KV caching).
Previous experience building platforms from the ground up in a high-growth

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 91133974
Position Id: 8967102
Posted 6 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

AI Engineer

Remote or Redwood City, California

•

Today

Are you an experienced AI/GenAI engineer who loves shipping real systems? Join Stanford's Enterprise Technology team to design, implement, and support AI solutions across university use cases. In this role, you will influence strategic direction requirements and architecture for AI-driven information systems, incorporating new capabilities (LLMs, RAG, agentic frameworks, MLOps) to improve workflow, efficiency, and decision-making. You may serve as the technical lead for specific AI tracks and in

Full-time

USD 169,728.00 - 194,585.00 per year

AI Engineer

San Mateo, California

•

Today

About Applied Business Software, Inc., Applied Business Software, Inc., (ABS), the maker of The Mortgage Office , is the industry leader in private lending and loan management technology. Since 1978, our software has powered thousands of lenders, including private money firms, municipalities, CDFIs, Tribal Nations, universities, franchisors, third-party servicers and other non-bank lenders, helping them streamline, automate, and scale their lending operations with confidence. About the role We a

Full-time

USD 160,000.00 - 180,000.00 per year

Senior AI Platform Engineer

San Francisco, California

•

Today

As a leading financial services and healthcare technology company based on revenue, SS&C is headquartered in Windsor, Connecticut, and has 27,000+ employees in 35 countries. Some 20,000 financial services and healthcare organizations, from the world's largest companies to small and mid-market firms, rely on SS&C for expertise, scale, and technology. Job Description Senior AI Platform Engineer Locations: San Francisco, CA / Jacksonville, FL / Boston, MA / New York, NY (Hybrid) Get To Know Us:

Full-time

USD 150,000.00 per year

AI/ML Computational Science Manager

San Francisco, California

•

Today

Design and develop artificial intelligence AI and machine learning ML systems leveraging existing cloud AI services. Design and build scalable data pipelines to support model training and production with DevOps MLOps. Customize and apply Deep Learning and Gen AI models for use cases based on the business needs, data availability, system and infrastructure requirements including edge devices and High Performance Computers HPCs . Justify the quality and value of the solution. Engage in research an

Full-time

USD 94,400.00 per year

Search all similar jobs