Data Scientist

  • McLean, VA
  • Posted 2 days ago | Updated 2 days ago

Overview

On Site
140,000 - 180,000
Full Time
Accepts corp to corp applications
No Travel Required
Unable to Provide Sponsorship

Skills

API
ATLAS
Artificial Intelligence
Cloud Computing
Amazon Web Services
Apache Spark
ELT
Extract, Transform, Load
FOCUS
Generative Artificial Intelligence (AI)
Jupyter
Kubernetes
Machine Learning (ML)
Large Language Models (LLMs)
Collaboration
Microsoft Azure
Knowledge Base
MongoDB
Python
UI
PySpark
Workflow
Semantics
Microsoft Certified Professional

Job Details

Senior Data Scientist Role 
McLean, VA - Onsite Job
Long Term Contract

Must Have Qualifications: Must have hands on experience with machine learning transitioned into GenAI. Rag, Python- Jupyter, other Software knowledge, using agents in workflows, strong understanding of data. Preferred: Built AI agent, MCP, A2A, Graph Rag, deployed Gen AI applications to production.

Job Description:
We are seeking a highly experienced **Principal Gen AI Scientist** with a strong focus on **Generative AI (GenAI)** to lead the design and development of cutting-edge AI Agents, Agentic Workflows and Gen AI Applications that solve complex business problems. This role requires advanced proficiency in Prompt Engineering, Large Language Models (LLMs), RAG, Graph RAG, MCP, A2A, multi-modal AI, Gen AI Patterns, Evaluation Frameworks, Guardrails, data curation, and AWS cloud deployments. You will serve as a hands-on Gen AI (data) scientist and critical thought leader, working alongside full stack developers, UX designers, product managers and data engineers to shape and implement enterprise-grade Gen AI solutions.


**Key Responsibilities: **

* Architect and implement scalable AI Agents, Agentic Workflows and GenAI applications to address diverse and complex business use cases.
* Develop, fine-tune, and optimize lightweight LLMs; lead the evaluation and adaptation of models such as Claude (Anthropic), Azure OpenAI, and open-source alternatives.
* Design and deploy Retrieval-Augmented Generation (RAG) and Graph RAG systems using vector databases and knowledge bases.
* Curate enterprise data using connectors integrated with AWS Bedrock's Knowledge Base/Elastic
* Implement solutions leveraging MCP (Model Context Protocol) and A2A (Agent-to-Agent) communication.
* Build and maintain Jupyter-based notebooks using platforms like SageMaker and MLFlow/Kubeflow on Kubernetes (EKS).
* Collaborate with cross-functional teams of UI and microservice engineers, designers, and data engineers to build full-stack Gen AI experiences.
* Integrate GenAI solutions with enterprise platforms via API-based methods and GenAI standardized patterns.
* Establish and enforce validation procedures with Evaluation Frameworks, bias mitigation, safety protocols, and guardrails for production-ready deployment.
* Design & build robust ingestion pipelines that extract, chunk, enrich, and anonymize data from PDFs, video, and audio sources for use in LLM-powered workflows—leveraging best practices like semantic chunking and privacy controls
* Orchestrate multimodal pipelines** using scalable frameworks (e.g., Apache Spark, PySpark) for automated ETL/ELT workflows appropriate for unstructured media
* Implement embeddings drives—map media content to vector representations using embedding models, and integrate with vector stores (AWS KnowledgeBase/Elastic/Mongo Atlas) to support RAG architectures

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.