Apply Now

Member of Technical Staff (Data Scientist, Evals)

San Francisco, CA, US • Posted 20 hours ago • Updated 7 hours ago

Full Time

On-site

Fitment

Dice Job Match Score™

🧠 Analyzing your skills...

Job Details

Skills

Search Engines
AIM
Use Cases
IT Management
Data Science
Python
SQL
Cloud Computing
Amazon Web Services
Databricks
Workflow
Artificial Intelligence
Customer Facing
Research
Machine Learning (ML)
Evaluation

Summary

Perplexity serves tens of millions of users daily with reliable, high-quality answers grounded in an LLM-first search engine and our specialized data sources. We aim to use the latest models as they are released, but the intelligence frontier is a jagged one, and popular benchmarks do not effectively cover our use cases. In this role, you will build specialized evals to improve answer quality across Perplexity, covering search-based LLM answers and other scenarios popular with our users.

Responsibilities

Architect and maintain automated evaluation pipelines to assess answer quality across Perplexity's products, ensuring high standards for accuracy and helpfulness
Design evaluation sets and methods specifically to measure the impact of tool calls (particularly web search retrieval) on the final answer's quality
Develop VLM-based solutions to programmatically evaluate how final answers render visually across different platforms and devices
Continuously review public benchmarks and academic evaluations for their applicability to the Perplexity product, adapting and incorporating them into our regular performance measurements
Operate within a small, high-impact team where your evaluation metrics directly shape product changes, collaborating closely with technical leadership to measure and improve Answer Quality

Qualifications

PhD or MS in a technical field or equivalent experience
4+ years of experience in data science or machine learning
Strong proficiency in Python and SQL (expected to write production-grade code)
Experience building within a modern cloud data stack, specifically AWS and Databricks
Comfortable with agentic coding workflows and using AI-assisted development tools to iterate faster

Preferred Qualifications

1+ years of experience working with LLMs at scale, specifically with LLM-as-a-judge setups
Prior experience working on customer-facing web products or consumer apps, with real user traffic at scale
A strong research background, with experience applying research methods to real-world ML problems
Experience defining evaluation metrics (e.g., factual consistency, hallucination rate, retrieval precision) and building ground truth datasets

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 91140125
Position Id: 51c71d10f2e2f53cb1f4a2f502527d0f
Posted 20 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

San Francisco, California

•

Today

About Glean: Glean is the Work AI platform that helps everyone work smarter with AI. What began as the industry's most advanced enterprise search has evolved into a full-scale Work AI ecosystem, powering intelligent Search, an AI Assistant, and scalable AI agents on one secure, open platform. With over 100 enterprise SaaS connectors, flexible LLM choice, and robust APIs, Glean gives organizations the infrastructure to govern, scale, and customize AI across their entire business - without vendor

Full-time

USD 200,000.00 - 300,000.00 per year

Senior Software Engineer, AI Evals

San Francisco, California

•

Today

About Sentry Software runs the world and the pace is faster than ever. Sentry helps developers fix errors and performance issues before users notice, so teams can spend less time firefighting and more time building. Trusted by 200,000+ organizations, Sentry is today's application monitoring standard and our team is building its AI-native future. About the role As a Senior Software Engineer on Sentry's AI/ML team, you'll be responsible for building the evaluation infrastructure that measures

Full-time

USD 240,000.00 - 280,000.00 per year

Scientific Lead, Generative AI Engineer, Applied Intelligence for Discovery

San Francisco, California

•

18d ago

At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We're looking for people who are determined

Full-time

USD 181,500.00 - 283,800.00 per year

Lead AI Engineer, Data Solutions

San Francisco, California

•

Today

To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts. Job Category Software Engineering Job Details About Salesforce Salesforce is the #1 AI CRM, where humans with agents drive customer success together. Here, ambition meets action. Tech meets trust. And innovation isn't a buzzword - it's a way of life. The world of work as we know it is changing and we're looking for Trailblazers who are passionate

Full-time

USD 172,500.00 - 260,100.00 per year

Search all similar jobs

More jobs at Perplexity AI in San Francisco, CA

Member of Technical Staff (Data Scientist, Evals)

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs