Apply Now

AI Platform Engineer, Training and Inference

Milpitas, CA, US • Posted 30+ days ago • Updated 9 hours ago

Full Time

On-site

USD $274,000.00 - 304,000.00 per year

Fitment

Dice Job Match Score™

✨ Finding the perfect fit...

Job Details

Skills

Business Process
Operational Efficiency
Regulatory Compliance
Scheduling
Streaming
Amazon S3
CheckPoint
Recovery
GPU
Routing
Cloud Computing
Workflow
Evaluation
Management
GCS
Promotions
Assembly
Machine Learning Operations (ML Ops)
Caching
PPO
Algorithms
Vector Databases
Neural Network
Python
PyTorch
Machine Learning (ML)
Computer Science
Military
Training
Artificial Intelligence
Recruiting

Summary

AI Platform Engineer - Training & Inference

Saviynt's AI-powered identity platform manages and governs human and non-human access to all of an organization's applications, data, and business processes. Customers trust Saviynt to safeguard their digital assets, drive operational efficiency, and reduce compliance costs. Built for the AI age, Saviynt is today helping organizations safely accelerate their deployment and usage of AI. Saviynt is recognized as the leader in identity security, with solutions that protect and empower the world's leading brands, Fortune 500 companies and government institutions. For more information, please visit ;br>
The AI Platform team is building the compute layer that trains, evaluates, and serves every AI model at Saviynt. We need an ML Platform Engineer to own distributed training on Ray + H100s, the multi-engine LLM inference mesh (vLLM, SGLang, NVIDIA Triton), and the full model promotion lifecycle - from shadow mode through canary rollout to GA.

The AI Platform team's mission is to build a secure, scalable, product-agnostic AI foundation that enables Saviynt's identity products to deliver measurable AI-powered outcomes. Training & Inference is the engine - it turns data into deployed models that make Saviynt's products smarter.

What You Will Be Doing

Own the Ray ecosystem end-to-end: manage KubeRay on GKE, tune Ray Core Task/Actor scheduling, operate the Plasma distributed object store, and configure Ray Data for GPU-direct streaming from GCS/S3
Operate distributed training with Ray Train: configure TorchTrainer + DDP/NCCL for multi-node H100 clusters, manage checkpoint lifecycle, implement spot-preemption recovery, and integrate warm-start fine-tuning for retrain pipelines
Build and operate the LLM inference mesh with Ray Serve: compose vLLM (PagedAttention), SGLang (RadixAttention), and NVIDIA Triton (TensorRT/ONNX) as a unified deployment graph with Plasma zero-copy memory sharing
Optimise inference performance: configure fractional GPU allocation, enable continuous batching, implement per-engine autoscaling based on request queue depth, and tune KV-cache block sizes
Design and operate the model routing layer: capability-based, version-based, and tenant-based routing with cost-aware fallback between self-hosted SLMs and cloud LLMs
Build RL training infrastructure: define Flyte workflows for RL pipelines (rollout, reward shaping, policy update, evaluation), integrate Ray RLlib or custom PPO/GRPO loops with Ray Train, and manage replay buffer persistence on GCS

Operate the full model promotion lifecycle: quality gate - integration tests - load tests (k6) - shadow mode - A/B gate - canary (10%-100%) with golden-signal auto-rollback

Operate the retrain pipeline: drift detection triggers, warm-start retraining, relative quality gates (V2 >= V1 - 2%), and automated Flyte DAG through to canary
Integrate RAG retrieval into the inference mesh: vector similarity search, context assembly, and prompt construction before LLM inference

What You Bring

Experience in ML engineering with time in an ML platform or MLOps role
Production Ray depth: Ray Train, Serve, Core, and Data - debugged real production failures including NCCL timeouts, Plasma OOM, and Serve autoscaling lag
LLM serving engines: hands-on with vLLM, SGLang, or NVIDIA Triton - PagedAttention, prefix caching, and continuous batching tuned for latency/throughput targets
Distributed training: DDP, FSDP, NCCL collectives, gradient checkpointing, and mixed precision (BF16/FP8)
RL working knowledge: PPO, policy gradient, or RLHF - able to translate an algorithm into distributed compute primitives

Model lifecycle operations: MLflow registry, shadow/A/B/canary patterns, and auto-
rollback on golden signal degradation

Vector databases: Pgvector or Qdrant - ANN index strategies, embedding upsert, and query latency tuning under inference load
Strong Python and PyTorch; Flyte or equivalent ML orchestrator
Quantization (nice to have): INT8/INT4/FP8 post-training quantization (GPTQ, AWQ, or bitsandbytes)
Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent
practical experience or equivalent military experience

We offer you a competitive total rewards package, learning and tremendous opportunities to grow and advance in your career. At Saviynt, it is not typical for an individual to be hired at or near the top of the range for their role and final compensation decisions are dependent on many factors including, but not limited to location; skill sets; experience and training; licensure and certifications; and other relevant business and organizational needs.

You may also be eligible to participate in a Saviynt discretionary bonus plan, subject to the rules governing the program, whereby an award, if any, depends on various factors, including, without limitation, individual and organizational performance.

$274,000 - $304,000 a year

We offer you a competitive total rewards package, learning and tremendous opportunities to grow and advance in your career. At Saviynt, it is not typical for an individual to be hired at or near the top of the range for their role and final compensation decisions are dependent on many factors including but are not limited to location; skill sets; experience and training; licensure and certifications; and other relevant business and organizational needs. A reasonable estimate of the current range is $240,000 - $260,000 annually.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses and identifying potential inconsistencies or verification signals in application materials based on available information. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10406473
Position Id: 233b12ae00fe8b1671b5ea70cf6edc37
Posted 30+ days ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

AI/ML Platform Engineer

Santa Clara, California

•

Today

WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the wo

Full-time

USD 178,500.00 per year

Senior Machine Learning Engineer, Services/MLOps

San Jose, California

•

Today

The Opportunity Firefly Foundry is Adobe's enterprise managed-service offering for custom multimedia generative AI - deep-tuned image, video, and 3D models built on each customer's IP, paired with creative production workflows and a media-intelligence layer, and deployed across new and existing Adobe surfaces. The business has gained significant traction in Media & Entertainment, marketing, and consumer retail, and is expanding rapidly into adjacent verticals. We are hiring a Senior Machine Lear

Full-time

USD 151,800.00 - 265,350.00 per year

AI Research Scientist - Infrastructure Engineer, Reinforcement Learning

Santa Clara, California

•

Today

Full-time

USD 178,500.00 per year

Senior MLOps & AI Infrastructure Engineer

San Jose, California

•

Today

Job Details: Job Description: About Altera At Altera , our independence as the world's largest pure-play FPGA solutions provider gives us the focus, speed, and agility to innovate without compromise. With more than four decades of industry-leading FPGA expertise, our singular mission is to deliver the programmable technologies that help customers differentiate, innovate, and scale across rapidly evolving markets like AI, cloud, networking, and edge. As an independent company, we move faster,

Full-time

USD 149,100.00 - 215,925.00 per year

Search all similar jobs

More jobs at Saviynt in Milpitas, CA