Apply Now

Principal Machine Learning Engineer, Work From Home - G

Remote • Posted 2 hours ago • Updated 2 hours ago

Full Time

Remote

$170,000 - $200,000/yr

Fitment

Dice Job Match Score™

🧠 Analyzing your skills...

Job Details

Skills

San Francisco CA Jobs
Principal Machine Learning Engineer
Apache Arrow
DeepSpeed
DPO
FasterTransformer
FSDP
GPU Kernels
JAX
LLM
Machine Learning
Megatron
ML
ORPO
PPO
Pytorch
Ray
RLHF Pipelines
Spark
TensorRT-LLM
Virtual Large Language Model
vLLM
Work From Home
ZeRO Ray
California Recruiters
IT Jobs
California Recruiting

Summary

Principal Machine Learning Engineer, Work From Home

As a Principal Machine Learning Engineer, you are a deep technical authority responsible for designing and evolving the most critical ML systems in the company. The Principal Machine Learning Engineer will operate across training, inference, evaluation, and infrastructure, solving the hardest architectural and performance problems. While Technical Leads may own execution at the team level, you set the technical standard and shape how ML systems are built across the organization. This is a hands-on, high-impact role focused on depth. This position is 100% Remote.

Principal Machine Learning Engineer Responsibilities:

- Architect and build large-scale ML systems spanning data, training, evaluation, inference, and deployment.

- Design reproducible, high-performance training pipelines across GPU infrastructure.

- Architect inference systems that balance latency, throughput, cost, and reliability at scale.

- Design and maintain data systems for high-quality synthetic and real-world training data.

- Implement evaluation pipelines covering performance, robustness, safety, and bias, in partnership with research leadership.

- Own production deployment, including GPU optimization, memory efficiency, latency reduction, and scaling policies.

- Collaborate closely with application engineering to integrate ML systems cleanly into backend, mobile, and desktop products.

- Make pragmatic trade-offs and ship improvements quickly, learning from real usage.

- Work under real production constraints: latency, cost, reliability, and safety

Principal Machine Learning Engineer Outcomes:

- ML systems (training, inference, evaluation) are reliable, scalable, and meet defined performance targets.

- Models deployed to production achieve measurable quality improvements and meet user-impact goals.

- Production issues are proactively monitored, debugged, and resolved with clear root-cause analysis.

- Team and cross-functional collaborators benefit from clear guidance, best practices, and scalable ML solutions.

- Research-to-production cycles are efficient, safe, and continuously improve the product experience.

Principal Machine Learning Engineer Qualifications:

- Strong background in deep learning and transformer-based architectures.

- Hands-on experience training, fine-tuning, or deploying large-scale ML models in production.

- Proficiency with at least one modern ML framework (e.g. PyTorch, JAX), and ability to learn others quickly.

- Experience with distributed training and inference frameworks (e.g. DeepSpeed, FSDP, Megatron, ZeRO, Ray).

- Strong software engineering fundamentals; you write robust, maintainable, production-grade systems.

- Experience with GPU optimization, including memory efficiency, quantization, and mixed precision.

- Comfort owning ambiguous, zero-to-one ML systems end-to-end.

- A bias toward shipping, learning fast, and improving systems through iteration.

- Experience with LLM inference frameworks such as vLLM, TensorRT-LLM, or FasterTransformer.

- Contributions to open-source ML or systems libraries.

- Background in scientific computing, compilers, or GPU kernels.

- Experience with RLHF pipelines (PPO, DPO, ORPO).

- Experience training or deploying multimodal or diffusion models.

- Experience with large-scale data processing (Apache Arrow, Spark, Ray).

Benefits include medical insurance, Dental, Vision, Savings Plan Options, PTO, etc.

Keywords: San Francisco CA Jobs, Principal Machine Learning Engineer, Apache Arrow, DeepSpeed, DPO, FasterTransformer, FSDP, GPU Kernels, JAX, LLM, Machine Learning, Megatron, ML, ORPO, PPO, Principal Machine Learning Engineer, Pytorch, Ray, RLHF Pipelines, Spark, TensorRT-LLM, Virtual Large Language Model, vLLM, Work From Home, ZeRO Ray, California Recruiters, IT Jobs, California Recruiting

Looking to hire a Principal Machine Learning Engineer in San Francisco, CA or in other cities? Our IT recruiting agencies and staffing companies can help.

We help companies that are looking to hire Principal Machine Learning Engineers for jobs in San Francisco, California and in other cities too. Please contact our IT recruiting agencies and IT staffing companies today!

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: NEXIL001
Position Id: 9016100
Posted 2 hours ago

Contact the job poster

Raul Garcia

Senior Recruiter @ Next Step Systems

View Profile

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Remote

•

Today

Member of Technical Staff, Machine Learning, Work From Home As a Member of Technical Staff, Machine Learning, you will build core ML components. The Member of Technical Staff will work on real production systems from day one, learning how large-scale ML behaves outside of research settings. The Member of Technical Staff role is for engineers who want to develop strong systems judgment by shipping, debugging, and iterating on real-world ML. This position is 100% Remote. Member of Technical Sta

Easy Apply

Full-time

$160,000 - $190,000

Machine Learning Engineer

Remote

•

Today

Who we are At Twilio, we're shaping the future of communications, all from the comfort of our homes. We deliver innovative solutions to hundreds of thousands of businesses and empower millions of developers worldwide to craft personalized customer experiences. Our dedication to remote-first work, and strong culture of connection and global inclusion means that no matter your location, you're part of a vibrant team with diverse experiences making a global impact each day. As we continue to revol

Full-time

USD 155,520.00 - 194,400.00 per year

Machine Learning Engineer 5 - Globalization

Remote

•

Today

At Netflix, our mission is to entertain the world. Together, we are writing the next episode - pushing the boundaries of storytelling, global fandom and making the unimaginable a reality. We are a dream team obsessed with the uncomfortable excitement of discovering what happens when you merge creativity, intuition and cutting-edge technology. Come be a part of what's next. The Globalization Data Science and Engineering team is at the forefront of removing language barriers and providing a stell

Full-time

USD 466,000.00 - 750,000.00 per year

Senior AI/ML Software Engineer - Remote

Remote or Eden Prairie, Minnesota

•

Today

Optum Tech is a global leader in health care innovation. Our teams develop cutting-edge solutions that help people live healthier lives and help make the health system work better for everyone. From advanced data analytics and AI to cybersecurity, we use innovative approaches to solve some of health care's most complex challenges. Your contributions here have the potential to change lives. Ready to build the next breakthrough? Join us to start Caring. Connecting. Growing together. Optum AI is U

Full-time

USD 120,100.00 - 214,500.00 per year

Search all similar jobs

Remote jobs at Next Step Systems

Principal Machine Learning Engineer, Work From Home - G

Dice Job Match Score™

Job Details

Skills

Summary

Raul Garcia

Similar Jobs