Overview
Remote
Depends on Experience
Contract - W2
Skills
ML evaluation
NLTK
Hugging Face evaluate
sacrebleu
Cohens Kappa
DVC
Git-LFS
MLflow
HIPAA or GxP (21 CFR Part 11)
Power BI
Grafana
Job Details
Job Title: GenAI Evaluation Engineer
Location: Remote
Location: Remote
Duration: Long Term
Required Qualifications:
- BS/MS in Computer Science, Statistics, Engineering, or a related field.
- 2 5 years of experience in ML evaluation, QA engineering, or analytics, ideally in regulated domains.
- Proficiency in Python, pandas, and NumPy.
- Hands-on experience with evaluation libraries like NLTK, HuggingFace evaluate, and sacrebleu.
- Strong statistical rigor, with a deep understanding of metrics like Cohen s Kappa.
- Experience with BI/dashboarding tools (Power BI, Grafana) and data versioning tools (DVC, Git-LFS).
Preferred Qualifications (Nice-to-Haves):
- Experience with MLflow for tracking experiments and metrics.
- Background in qualitative research methods like open/axial coding, especially in safety-critical settings.
- Expertise in regulatory compliance and audits for standards like HIPAA or GxP (21 CFR Part 11).
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.