Apply Now

Senior MLOps & AI Infrastructure Engineer

San Jose, CA, US • Posted 30+ days ago • Updated 1 hour ago

Full Time

On-site

USD $149,100.00 - 215,925.00 per year

Fitment

Dice Job Match Score™

🤯 Applying directly to the forehead...

Job Details

Skills

Altera
FOCUS
Computer Networking
Innovation
FPGA
Software Engineering
DevOps
Lifecycle Management
Evaluation
A/B Testing
SAFE
Data Engineering
Management
AML
Performance Tuning
Provisioning
Data Quality
Collaboration
Leadership
Mentorship
Workflow
scikit-learn
XGBoost
Amazon SageMaker
Vertex
Terraform
Bash
SQL
Grafana
Data Science
PyTorch
TensorFlow
JAX
Data Modeling
Python
Cloud Computing
Amazon Web Services
Google Cloud
Google Cloud Platform
Microsoft Azure
Docker
Kubernetes
Continuous Integration
Continuous Delivery
Computer Science
Statistics
Semiconductors
Integrated Circuit
Place And Route
LSF
GPU
Computer Cluster Management
Training
LangChain
Autogen
Deep Learning
Optimization
DevSecOps
Regulatory Compliance
HPC
Research
Open Source
Machine Learning (ML)
Machine Learning Operations (ML Ops)
Artificial Intelligence
Synopsys
Cadence
Siemens
EDA
Data Analysis
Military
Law

Summary

Job Details:

Job Description:

About Altera

At Altera , our independence as the world's largest pure-play FPGA solutions provider gives us the focus, speed, and agility to innovate without compromise. With more than four decades of industry-leading FPGA expertise, our singular mission is to deliver the programmable technologies that help customers differentiate, innovate, and scale across rapidly evolving markets like AI, cloud, networking, and edge. As an independent company, we move faster, invest deeper, and partner more closely-empowering our teams to drive breakthrough innovation and shape the future of the FPGA industry.

About the Role

We are looking for a Senior MLOps & AI Infrastructure Engineer to architect, build, and operationalize machine learning systems at scale. This role sits at the intersection of data science, software engineering, and infrastructure - combining deep ML expertise with the DevOps/MLOps discipline required to ship models reliably into production.

You will partner closely with software, data, and infrastructure teams to design end-to-end ML pipelines, automate model lifecycle management, and deliver AI-powered capabilities across our EDA, HPC, and cloud environments.

Key Responsibilities:

ML Platform & Pipeline Engineering

Design, build, and maintain scalable ML pipelines for training, evaluation, and deployment across cloud and on-prem HPC environments

Build MLOps infrastructure including experiment tracking, model registry, feature stores, and automated retraining workflows

Implement CI/CD/CT (Continuous Training) pipelines for ML models using tools such as Kubeflow, MLflow, Airflow, or similar

Containerize ML workloads with Docker and orchestrate at scale using Kubernetes and GPU node pools

Model Development & Optimization

Develop, fine-tune, and deploy large-scale models including LLMs, GNNs, and reinforcement learning agents for EDA and chip design applications

Apply advanced techniques: transfer learning, quantization, pruning, distillation, and RLHF for production-grade model efficiency

Implement A/B testing frameworks and shadow deployments for safe model rollout

Benchmark and optimize model inference performance on GPU/TPU clusters

Data Engineering & Feature Management

Build and maintain data pipelines for large-scale structured and unstructured datasets (terabyte-scale)

Collaborate with data teams to design feature engineering systems and maintain data quality for ML training

Implement data versioning and lineage tracking (DVC, Delta Lake, or similar)

Infrastructure & Operations

Manage cloud ML infrastructure on AWS (SageMaker), Azure (AML), or Google Cloud Platform (Vertex AI) with cost and performance optimization

Automate infrastructure provisioning using Terraform or CloudFormation for GPU-backed ML environments

Build monitoring, alerting, and observability systems for model performance drift, data quality, and system health

Support HPC schedulers (LSF, Slurm) for large-scale distributed training jobs

Collaboration & Leadership

Partner with research scientists to productionize experimental models with engineering rigor

Mentor junior engineers and define ML engineering best practices across the organization

Drive adoption of AI/ML solutions within semiconductor, EDA, and simulation workflows

Technology Stack

ML Frameworks:

PyTorch TensorFlow JAX Hugging Face scikit-learn XGBoost

MLOps & Pipelines:

MLflow Kubeflow Airflow Weights & Biases DVC Feast

Infrastructure & Cloud:

AWS SageMaker / Google Cloud Platform Vertex AI / Azure ML Terraform Docker Kubernetes Slurm / LSF

Languages:

Python Bash Go SQL

Monitoring & Observability:

Prometheus Grafana ELK Stack Evidently AI Arize

Key Competencies

Strong ownership mindset - you drive ML initiatives from prototype to production without being asked

Bias toward automation: if you do it twice, you automate it

Ability to bridge research and engineering - translating papers into production-grade systems

Thrives in fast-paced, ambiguous environments typical of deep-tech and semiconductor companies

Clear communicator who can explain complex ML concepts to non-technical stakeholders

Salary Range

The pay range below is for Bay Area California only. Actual salary may vary based on a number of factors including job location, job-related knowledge, skills, experiences, trainings, etc. We also offer incentive opportunities that reward employees based on individual and company performance.

$149,100 - $215,925 USD

We use artificial intelligence to screen, assess, or select applicants for the position. Applicants must be eligible for any required U.S. export authorizations.

Qualifications:

Required Qualifications

Bachelor's or Master's degree in Computer Science, Machine Learning, Statistics, or related field and 10+ years of industry experience
10+ years of experience across ML engineering, data science, and MLOps - including frameworks (PyTorch, TensorFlow, JAX, Hugging Face) and production model deployment at scale
8+ years of experience experience with parallelism strategies (FSDP, DeepSpeed, data/model parallelism)
10+ years of experience and proficiency in Python programming
8+ years of experience in cloud ML platforms (AWS, Google Cloud Platform, Azure), Docker/Kubernetes, and CI/CD pipelines
5+ years of hands-on experience with MLflow, W&B, or Neptune for tracking and reproducibility

Preferred Qualifications

Phdin Computer Science, Machine Learning, Statistics, or related field
Experience applying ML/AI to semiconductor, EDA, or chip design domains (e.g., timing prediction, place & route optimization, DRC closure)
Familiarity with HPC schedulers such as LSF or Slurm and GPU cluster management for training workloads
Knowledge of LLM fine-tuning, Retrieval-Augmented Generation (RAG) architectures, and AI agent frameworks such as LangChain or AutoGen
Experience with graph neural networks (GNNs) or geometric deep learning for circuit and netlist analysis
Background in reinforcement learning for optimization problems
Exposure to zero-trust security, DevSecOps, and compliance automation for ML systems
Experience working with large-scale simulation pipelines and synthetic data generation
Experience at organizations such as NVIDIA, AMD, Intel, Google DeepMind, or similar AI/HPC-focused companies
Published research or open-source contributions in ML, MLOps, or AI for EDA
Experience building AI-powered developer tools or copilot-style products
Familiarity with Synopsys, Cadence, or Siemens EDA toolchains and associated data formats

Job Type:
Regular

Shift:
Shift 1 (United States of America)

Primary Location:
San Jose, California, United States

Additional Locations:

Posting Statement:
All qualified applicants will receive consideration for employment without regard to race, color, religion, religious creed, sex, national origin, ancestry, age, physical or mental disability, medical condition, genetic information, military and veteran status, marital status, pregnancy, gender, gender expression, gender identity, sexual orientation, or any other characteristic protected by local law, regulation, or ordinance.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: RTX172d37
Position Id: 2780eeb25d3970309165dff09ee02523
Posted 30+ days ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Sunnyvale, California

•

Today

Job Description At General Motors, our product teams are redefining mobility. Through a human-centered design process, we create vehicles and experiences that are designed not just to be seen, but to be felt. We're turning today's impossible into tomorrow's standard -from breakthrough hardware and battery systems to intuitive design, intelligent software, and next-generation safety and entertainment features. Every day, our products move millions of people as we aim to make driving safer, smart

Full-time

USD 153,200.00 - 234,100.00 per year

Senior Machine Learning Engineer, Services/MLOps

San Jose, California

•

Today

The Opportunity Firefly Foundry is Adobe's enterprise managed-service offering for custom multimedia generative AI - deep-tuned image, video, and 3D models built on each customer's IP, paired with creative production workflows and a media-intelligence layer, and deployed across new and existing Adobe surfaces. The business has gained significant traction in Media & Entertainment, marketing, and consumer retail, and is expanding rapidly into adjacent verticals. We are hiring a Senior Machine Lear

Full-time

USD 151,800.00 - 265,350.00 per year

Staff ML Infrastructure Engineer - Embodied AI

Sunnyvale, California

•

Today

Full-time

USD 189,300.00 - 290,700.00 per year

Sr. / Staff ML Engineer, FM Training Integration - ML Compute

Santa Clara, California

•

Today

We are looking for a ML Engineer to join our ML Compute team to help improve the efficiency, scalability, and reliability of model training and inference workloads in the cloud. In this role, you will lead the integration of large-scale ML workloads with cloud infrastructure, working cross-functionally with ML engineers, infrastructure engineers, and researchers to optimize performance, improve system efficiency, and drive high utilization of accelerator resources. Description We are a group o

Full-time

Search all similar jobs

More jobs at Altera Corp. in San Jose, CA

Senior MLOps & AI Infrastructure Engineer

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs