Apply Now

Machine Learning Engineer - LLM Evaluation & Automation

Remote • Posted 8 hours ago • Updated 8 hours ago

Full Time

Remote

Fitment

Dice Job Match Score™

📊 Calculating match score...

Job Details

Skills

Large Language Models (LLMs)
Management
KPI
User Experience
Statistics
Data Science
Science
Psychology
Prompt Engineering
Machine Learning (ML)
Natural Language Processing
Workflow
Quality Assurance
Analytical Skill
Problem Solving
Conflict Resolution
ROOT
Project Management
Collaboration
Communication
Attention To Detail
FOCUS
Business Rules
Evaluation
Python
SQL
Big Data
PySpark
Health Insurance
Professional Development
IT Consulting
Product Engineering
Advanced Analytics
Business Acumen
Business Transformation
Leadership
Artificial Intelligence
Data Analysis
Cloud Computing
DevOps
Software Modernization
Customer Experience
GRID
Dynamics

Summary

We are seeking a highly skilled Machine Learning Engineer who specializes in leveraging Large Language Models (LLMs) for automated evaluation and quality assessment. In this role, you will design and build systems that automatically measure and improve the accuracy, relevance, and consistency of model outputs. You will lead initiatives to create evaluation pipelines, develop metrics, and deliver actionable insights for continuous improvements. This position requires strong technical expertise, analytical problem-solving abilities, and the capacity to manage projects across multiple cross-functional teams.

Essential functions

Responsibilities:

Design and implement automated systems and pipelines for evaluating LLM outputs.
Develop metrics and KPIs to measure output quality, accuracy, and consistency using LLM-based evaluations
Collaborate with Engineering teams to create automated logic checks and validation tools.
Partner with Data Scientists to analyze evaluation results and optimize prompt and task structures.
Provide feedback loops to ensure evaluation guidelines align with LLM-based assessments.
Investigate how LLM-derived evaluations can enhance product reliability and user experience.
Recommend refinements to prompt engineering, evaluation strategies, and automation tools.
Stay informed on emerging trends in LLM evaluation, automated quality assessment, and AI toolchains.
Continuously improve and expand automated evaluation processes based on industry best practices.

Qualifications

5+ years of experience in ML engineering, NLP, or AI/ML automation.
Advanced degree (MS/PhD) in Statistics, Data Science, Computational Social Science, Quantitative Psychology, or a related field.
Hands-on experience in prompt engineering and designing LLM-based evaluation systems is preferred
Strong understanding of machine learning principles with focus on NLP and advanced LLM capabilities (e.g., Chain-of-Thought, agentic workflows)
Expertise in building automated evaluation or QA pipelines.
Excellent analytical and problem-solving skills with experience in root cause and error pattern analysis.
Proven project management and cross-functional collaboration experience.
Excellent communication skills to convey complex insights to technical and non-technical audiences.
Detail-oriented mindset with a focus on evaluation metrics, prompt design, and automation.
Ability to quickly adapt to new business rules and evaluation guidelines across diverse product domains.
Strong programming skills in Python and SQL.
Experience with big data technologies like PySpark for data aggregation and sampling is a strong plus

We offer

Opportunity to work on cutting-edge projects
Work with a highly motivated and dedicated team
Competitive salary
Flexible schedule
Benefits package - medical insurance, vision, dental, etc.
Corporate social events
Professional development opportunities
Well-equipped office

About us

Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI, and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical challenges and enable positive business outcomes for enterprise companies undergoing business transformation. A key differentiator for Grid Dynamics is our 8 years of experience and leadership in enterprise AI , supported by profound expertise and ongoing investment in data , analytics , cloud & DevOps , application modernization and customer experience . Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: RTX145791
Position Id: 7616e0c3b845f22ba7dce7db9627278c
Posted 8 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Senior Machine Learning Test Engineer

Remote or Boston, Massachusetts

•

Today

Job Requisition ID # 26WD98377 Senior Machine Learning Test Engineer Location: United States East Coast Position Overview As a Senior Machine Learning Test Engineer in the Research Enablement team, you will work side-by-side with researchers, Machine Learning developers and developers, and software developers and developers to define and uphold quality standards for ML systems. You are a quality-focused developer who is passionate about reliable, repeatable evaluation of ML models and data.

Full-time

LLM Platform Engineer

Remote or San Francisco, California

•

Today

Join the Future of Commerce with Whatnot! Whatnot is the largest livestream shopping platform in North America and Europe to buy, sell, and discover the things you love. Whether it's trading cards, fashion, electronics, or live plants, our sellers are building real businesses across hundreds of categories. We're building live commerce at a scale that's never been done in the West, and there's no playbook to copy. The people here are shaping how an entirely new industry develops. As a remote co

Full-time

USD 200,000.00 - 345,000.00 per year

Research Engineer 5 - LLM-Driven Product Understanding

Remote

•

Today

At Netflix, our mission is to entertain the world. Together, we are writing the next episode - pushing the boundaries of storytelling, global fandom and making the unimaginable a reality. We are a dream team obsessed with the uncomfortable excitement of discovering what happens when you merge creativity, intuition and cutting-edge technology. Come be a part of what's next. The Content Discovery and Personalization team is looking for an experienced Research Engineer to research, develop, and it

Full-time

USD 466,000.00 - 750,000.00 per year

On-Call AI Solutions Specialist - REMOTE

Remote or New York, New York

•

3d ago

ICF is seeking a technically skilled On-Call AI Solutions Specialist to for Child Welfare Information Gateway, a service of the Children's Bureau (CB), Administration for Children and Families (ACF), US Department of Health and Human Services (DHHS) that centralizes resources for the nation's child welfare, adoption, and foster care professionals, along with the public. The AI Solutions Specialist will lead the development, testing, and deployment of customized large language models (LLMs) withi

Full-time

USD 67,355.00 - 114,503.00 per year

Search all similar jobs