MLOps/Data Scientist - LLM - REMOTE WORK - 66966
Pay Range - $65 - $70/hr
One of our clients is looking for a MLOps/Data Scientist - LLM to join their team remotely.
TECH STACK
Python, LangChain, LlamaIndex, MLflow, Svelte/SvelteKit/TypeScript, MongoDB, Qdrant, FastAPI, Kubernetes, Terraform, AWS (EKS, Lambda, S3, Bedrock, etc.), Azure Cognitive Services, REST, GraphQL, OpenAI and HuggingFace APIs, Anthropic Claude API, scikit-learn, pandas, numpy, prompt engineering frameworks, evaluation libraries, A/B testing tools, statistical analysis tools.
RESPONSIBILITIES
Design evaluation strategies and roadmaps aligned with development priorities
Create rigorous experiments to test prompt variations, hyperparameters, and agentic tooling configurations Define success criteria and quality gates for AI features before development begins
Interpret evaluation results and identify systematic patterns in failures and successes
Make data-driven go/no-go decisions on feature readiness
Drive prompt engineering improvements based on systematic testing and iteration
Recommend specific changes to cognitive functions: prompt adjustments, parameter tuning, tool selection
Provide statistical rigor to experiment design (sample sizes, significance testing, holdout sets)
Transform metrics into actionable insights with clear next steps for developers
Lead weekly evaluation standups and present findings to stakeholders
Mentor team members on evaluation best practices and ML principles
Document evaluation frameworks and build institutional knowledge
QUALIFICATIONS
Advanced degree in Computer Science, Machine Learning, Statistics, or related field (MS/PhD preferred)
3+ years of hands-on experience with large language models and prompt engineering
Strong background in applied machine learning, particularly NLP or generative AI
Deep understanding of evaluation methodologies: metrics selection, dataset design, statistical testing
Experience with experiment design: A/B testing, hyperparameter optimization, systematic variation testing
Proficiency with LLM APIs and prompt engineering frameworks
Strong programming skills in Python and ML libraries
Practical experience optimizing AI systems for production use cases
Understanding of agentic AI architectures: tool use, function calling, multi-step reasoning
Proven ability to translate technical findings into clear, actionable recommendations
Strong decision-making skills: comfortable making go/no-go calls based on data
Excellent communication and cross-functional collaboration abilities
Self-directed and proactive problem solver
Experience building evaluation infrastructure or MLOps tooling desirable
Background in RLHF or constitutional AI desirable
Published research or blog posts on LLM evaluation or prompt engineering desirable
For immediate consideration:
Neetu
PRIMUS Global Services
Direct
Desk: Ext. 419
Email: