OverviewJoin Microsoft's CoreAI group as a Principal Research Engineer on the AI Data Platform team-the foundation for secure, scalable, reusable datasets that power AI model development across the company. This central platform manages the full lifecycle of Microsoft's AI training data, accelerating model development with high-quality, compliant, and reusable datasets and services.
Responsibilities- Design and build a data quality evaluation framework for AI training datasets, including scalable metrics, testing methodologies, and automated reporting.
- Define and operationalize quality signals aligned to model outcomes (e.g., coverage, diversity, noise/duplication, labeling consistency, safety/toxicity, privacy/compliance risk indicators).
- Collaborate with cross-functional stakeholders to run experiments, establish best practices, and deliver reusable tools that scale across multiple model and product teams.
- Develop task- and model-aware evaluation approaches that connect dataset properties to training performance, reliability, and safety.
- Create automated dataset validation gates and monitoring to support continuous dataset iteration (e.g., regression detection across dataset versions).
- Design and implement synthetic data generation pipelines (LLM-driven and programmatic approaches) to improve long-tail representation, fill coverage gaps, and accelerate iteration cycles.
- Build guardrails for synthetic data: filtering, scoring, calibration, provenance tracking, and bias/safety checks to ensure quality and compliance.
- Partner with engineering to integrate evaluation and generation into the platform's end-to-end data lifecycle.
QualificationsRequired Qualifications:- Bachelor's Degree in Computer Science, Electrical or Computer Engineering, or related field AND 6+ years related experience (e.g., statistics, predictive analytics, research)
- OR Master's Degree in Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics, predictive analytics, research)
- OR Doctorate in Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research)
- OR equivalent experience.
Other Requirements:Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Preferred Qualifications:- Doctorate in Computer Science, Electrical or Computer Engineering, or related field AND 3+ year(s) related experience (e.g., statistics, predictive analytics, research)
- 5+ years of coding experience in Python and experience with ML frameworks such as PyTorch and Triton
- 3+ years experience of large-scale model training for LLMs, SLMs, and agentic models
- 3+ years of proven ability to design and scale training infrastructure and pipelines in production environments
- Experience with agent training frameworks
- Demonstrated experience developing synthetic data generation pipelines to enable SFT and RL training of agentic models
- Hands-on experience with large-scale distributed training and/or serving with demonstrated ability to dive deep into complex systems, troubleshoot unconventional issues, and craft innovative solutions under real-world constraints
- Extensive experience with large-scale training, model inference, reinforcement learning, and reasoning models
- Demonstrated ability to work in cross-functional teams and collaborate effectively with researchers, product managers, and other engineers to deliver complex ML solutions
- Startup-style mindset: agile, solution-oriented, and self-driven
#CoreAI #LLMs #Agents #Data
Applied Sciences IC5 - The typical base pay range for this role across the U.S. is USD $139,900 - $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
;br>
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about
requesting accommodations.