Job Title: Senior Machine Learning Infrastructure Engineer
Role Overview
We are seeking a Senior Machine Learning Infrastructure Engineer to build the foundational engines that power our AI/ML initiatives. This is not a "model-building" role; it is a "system-building" role. You will architect and scale the pipelines, training platforms, and automation frameworks that allow our researchers and engineers to experiment faster, train more efficiently, and deploy models with absolute reliability.
You will have the autonomy to solve "blank-page" engineering problems: How do we reduce the time-to-market for a new model? How do we cut training costs without sacrificing quality? How do we build a self-service feature generation platform that scales to petabytes of data? If you are a pragmatic builder who values scalability and infrastructure stability as much as the models themselves, we want to hear from you.
Key Responsibilities
ML Infrastructure & Automation
Workflow Architecture: Design and build end-to-end ML workflows and automated pipelines that minimize manual intervention and accelerate the path from experimentation to production.
Training & Serving Platforms: Architect and scale our distributed training and model-serving infrastructure. Build the platforms that handle foundational model training, knowledge distillation, and high-performance inference.
Data Engineering at Scale: Develop robust data sampling and feature generation platforms that provide high-quality input for our ML systems.
Automation & Reliability: Build foundational tools that standardize how we train, track, and deploy models, ensuring high platform reliability and minimal deployment drift.
Performance & Optimization
Cost & Efficiency: Drive architectural decisions that optimize our infrastructure footprint. Implement smart resource management and cost-optimization strategies for large-scale training clusters.
Developer Productivity: Build "developer-first" internal tools that reduce the cognitive load on researchers, allowing them to focus on model logic rather than infrastructure configuration.
Qualifications & Requirements
Minimum Qualifications
Experience Baseline: Minimum of five (5) to ten (10) years of experience in designing, building, and maintaining large-scale ML infrastructure or distributed systems.
Infrastructure Mastery: Deep expertise in container orchestration (e.g., Kubernetes), distributed training, and cloud-native infrastructure.
Pipeline Expertise: Proven track record of managing massive-scale data pipelines and feature stores.
Collaboration: Strong "bridge-builder" personality-you are comfortable working in a high-velocity environment alongside both pure research scientists and core software engineers.
Preferred Attributes
Proven background at large-scale AI/ML-driven technology companies.
Experience with foundational model training infrastructure and techniques (e.g., model distillation, fine-tuning at scale).
A "Scalability Mindset"-you prioritize modular, reusable, and testable code over "quick and dirty" scripts.
Equal Opportunity Employer / Disabled / Protected Veterans
The Know Your Rights poster is available here:
_EEOC_KnowYourRights6.12.pdf
The pay transparency policy is available here:
_%20English_formattedESQA508c.pdf
For temporary assignments lasting 13 weeks or longer, AllSTEM Connections is pleased to offer major medical, dental, vision, 401k and any statutory sick pay where required.
We are committed to working with and providing reasonable accommodations to individuals with disabilities. If you need a reasonable accommodation for any part of the employment process, please contact your staffing representative who will reach out to our HR team.
AllSTEM Connections participates in the E-Verify program in certain locations as required by law. Learn more about the E-Verify program.
_Participation_Poster_ES.pdf
We also consider for employment qualified applicants regardless of criminal histories, consistent with legal requirements, including, if applicable, the City of Los Angeles' Fair Chance Initiative for Hiring Ordinance. Pursuant to applicable state and municipal Fair Chance Laws and Ordinances, we will consider for employment-qualified applicants with arrest and conviction records, including, if applicable, the San Francisco Fair Chance Ordinance. For Los Angeles, CA applicants: Qualified applications with arrest or conviction records will be considered for employment in accordance with the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act.
Additional Skills
(none specified)
AllSTEM Representative Contact Info
Account Executive:
Nichols
Branch Phone:
Location:
Ontario, CA
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
- Dice Id: 80184143
- Position Id: 1115bfbeb886f92c1228780e8e4db264
- Posted 1 day ago