Senior Software Engineer LLM Evaluation
Remote • Posted 4 hours ago • Updated 4 hours ago

Zeal Solutions Inc
Dice Job Match Score™
⭐ Evaluating experience...
Job Details
Skills
- LLM
- java
- javascript
- Python
- AI/ML models
- LLMs
Summary
Senior Software Engineer LLM Evaluation
Mandatory top-tier product company Google (Alphabet), Apple, Amazon, Meta (Facebook), Netflix, Microsoft, Tesla, NVIDIA, Adobe, Salesforce, Github, Atlassian, hashiCorp, Databricks, Snowflake, Cloudflare, DigitalOcean, MongoDB, Elastic, Confluent, Airbnb, Dropbox, Stripe, Palantir, Uber, Lyft, Square (Block), Twilio, Snap Inc., Pinterest, Figma, Oracle, Cisco, Paypal, Doordash, Rivian, Reddit, Coinbase, Splunk, Spotify, Goldman Sachs, Morgan Stanley, JP Morgan Chase, Capital One, Plaid, Shopify, Intuit, Workday, ServiceNow, Hugging Face, VMware, Brex, Wise, Epic Games, Unity Technologies, Activision Blizzard, Riot Games, Valve, Huawei, Bloomberg, ByteDance, Alibaba, Baidu, Notion, Klarna, Instacart, Zillow.
Experience: 5+ Years
Location: US & Western Europe (France, Germany, Switzerland, Singapore, Denmark, Finland, Netherlands, Sweden, Iceland, Italy, Austria, Ireland, Norway) - Remote
We are hiring a Senior Software Engineer LLM Evaluation to help build and evaluate next-generation large language models. This role focuses on creating high-quality engineering datasets, evaluating AI-generated code, and improving the reliability, scalability, and performance of AI-driven coding systems.
Role Overview
As a Software Engineering Evaluator, you will collaborate closely with AI researchers and engineering teams to design datasets, write production-grade code examples, validate model outputs, and build automated verification systems. You will play a critical role in improving how AI models understand, generate, and evaluate software engineering solutions across multiple programming languages.
Key Responsibilities
- Build and curate high-quality code examples for AI model training and evaluation
- Develop, correct, and optimize code in Python, JavaScript (including ReactJS), C/C++, Java, Rust, and Go
- Evaluate and refine AI-generated code for correctness, efficiency, scalability, and reliability
- Collaborate with cross-functional teams to benchmark AI-driven coding solutions against industry standards
- Build intelligent agents to detect errors, verify outputs, and identify recurring code quality issues
- Analyze and evaluate AI performance across the full software development lifecycle, including architecture, API design, prototyping, production, and monitoring
- Design automated verification frameworks to validate engineering task solutions
- Provide structured technical feedback to improve model reasoning and software engineering accuracy
Required Skills & Qualifications
- 5+ years of professional software engineering experience
- 2+ years of full-time experience at a top-tier product company (e.g., Google, Amazon, Meta, Apple, Microsoft, Netflix, Stripe, Datadog, Shopify, PayPal, IBM Research)
- Strong expertise in Python, JavaScript (ReactJS), C/C++, Java, Rust, and Go
- Proven experience building scalable, production-grade full-stack applications
- Deep understanding of software architecture, system design, debugging, testing, and code quality assessment
- Strong knowledge of performance optimization, reliability engineering, and maintainable coding practices
- Excellent written and verbal communication skills to articulate evaluation insights clearly
Preferred Experience
- Experience working with AI/ML models, LLMs, or evaluation pipelines
- Background in code review automation, developer tooling, or large-scale systems
- Experience in benchmarking, testing frameworks, and quality assurance engineering
- Passion for improving developer productivity and AI-assisted coding tools.
- Dice Id: 91172983
- Position Id: 348932
- Posted 4 hours ago
Similar Jobs
It looks like there aren't any Similar Jobs for this job yet.
Search all similar jobs