Title: AI Data Engineer
Location: Rockville, MD (Hybrid)
Duration: 12+ months Contract
Project Description
The AI Data Engineer and implements data pipelines and retrieval systems for a generative
AI platform. This role is responsible for ingesting, transforming, and indexing domain
content to enable accurate, grounded responses from AI-powered applications. The AI
Data Engineer collaborates with agent developers and platform engineers to continuously
improve knowledge retrieval quality and coverage.
Key Responsibilities:
Data Engineering & ETL
• Design and develop ETL pipelines for ingesting structured and unstructured data sources into searchable knowledge stores
• Build robust, repeatable ingestion workflows that handle document parsing, transformation, and loading at scale
• Implement data quality checks and validation to ensure accuracy and completeness of ingested content
• Utilize AWS services (e.g., S3, Lambda, Step Functions, OpenSearch, Bedrock) to build and operate data pipelines and retrieval infrastructure.
RAG Pipeline Development & Search Tuning
• Architect and optimize retrieval-augmented generation (RAG) pipelines including document chunking strategies, vector embedding generation, and retrieval mechanisms
• Tune search relevance and retrieval quality using vector databases and search engines, iterating on ranking and filtering approaches.
• Evaluate retrieval accuracy using evaluation frameworks and custom benchmarks, establishing measurable quality baselines
• Experiment with embedding models, chunking parameters, and hybrid search strategies to continuously improve answer quality.
Quality & Testing
• Design and implement test strategies for data pipelines, including validation of ingestion accuracy, data completeness, and transformation correctness
• Develop automated regression tests to detect retrieval quality degradation across pipeline changes
• Build and maintain evaluation benchmarks that measure retrieval precision, recall, and relevance
• Champion test-driven development (TDD) practices for pipeline and integration
code
Generative AI & Emerging Technologies
• Stay informed of advances in RAG architectures, embedding models, and retrieval optimization techniques
• Identify opportunities to improve knowledge retrieval through emerging approaches (e.g., contextual retrieval, reranking, hybrid search)
• Collaborate with agent developers to ensure knowledge tools return well structured, contextually relevant results.
Security & Compliance
• Assist with adherence to technology policies and comply with all security controls
• Implement secure coding practices, particularly in handling personally identifiable information (PII) and sensitive regulatory data
• Participate in threat modeling and security discussions for API and infrastructure components
• Understand and apply ***''''''''s security standards and best practices for regulated financial environments