Overview
Skills
Job Details
Data Lead (RAG-based applications)
Full-time | Remote (US-based) - Direct Client
Work Authorization: s and  
The Data Lead will be responsible for designing and maintaining our data infrastructure, including ETL pipelines, vector databases, and retrieval systems for RAG-based applications. You will guide data quality, governance, and performance optimization efforts, ensuring our platform delivers accurate, scalable, and cost-efficient data-driven experiences.
What you'll do:
● Data Engineering: Strong SQL and Python, ETL pipeline design, and data normalization/cleaning.
● Vector Databases & Retrieval: Hands-on with Pinecone, Weaviate, Milvus, or pgvector. Knowledge of index strategies (HNSW, IVF, PQ).
● RAG (Retrieval Augmented Generation): Designing retrieval strategies (chunking, embeddings selection, reranking).
● Embedding Models: Understanding how to choose and evaluate embedding models for domain-specific tasks.
● Data Modeling & Knowledge Graphs (nice-to-have): For linking structured/unstructured data.
● Data Quality & Governance: Setting standards for metadata, access controls, lineage, and freshness.
● Performance Optimization: Benchmark and tune latency, recall/precision, and cost/performance trade-offs.
About you:
● 6+ years in data engineering, data platform, or ML data roles.
● Strong SQL and Python skills for ETL and data workflows.
● Experience with vector databases (Pinecone, Weaviate, Milvus, pgvector).
● Proven ability to design retrieval pipelines for RAG.
● Deep understanding of embedding models and their evaluation.
● Familiarity with data quality and governance frameworks.
● Ability to optimize systems for latency, accuracy, and cost-efficiency.