Lead Data Engineer

Dallas, TX, US • Posted 6 hours ago • Updated 6 hours ago
Full Time
On-site
Depends on Experience
Fitment

Dice Job Match Score™

👤 Reviewing your profile...

Job Details

Skills

  • Data Engineer
  • Python
  • AWS
  • Azure
  • Cloud

Summary

We are seeking a Lead Data Engineer to build and scale the data infrastructure powering our Agentic AI products. You will be responsible for the "Ingestion-to-Insight" pipeline that allows autonomous agents to access, search, and reason over vast amounts of proprietary and public data.
Your role is critical: you will design the RAG (Retrieval-Augmented Generation) architectures and data pipelines that ensure our agents have the right context at the right time to make accurate decisions.
Lead Data Engineer
Dallas, TX
Key Responsibilities
  • AI-Ready Data Pipelines: Design and implement scalable ETL/ELT pipelines that process both structured (SQL, logs) and unstructured (PDFs, emails, docs) data specifically for LLM consumption.
  • Vector Database Management: Architect and optimize Vector Databases (e.g., Pinecone, Weaviate, Milvus, or Qdrant) to ensure high-speed, relevant similarity searches for agentic retrieval.
  • Chunking & Embedding Strategies: Collaborate with AI Engineers to optimize data chunking strategies and embedding models to improve the "recall" and "precision" of the agent's knowledge retrieval.
  • Data Quality for AI: Develop automated "Data Cleaning" workflows to remove noise, PII (Personally Identifiable Information), and toxicity from training/context datasets.
  • Metadata Engineering: Enrich raw data with advanced metadata tagging to help agents filter and prioritize information during multi-step reasoning tasks.
  • Real-time Data Streaming: Build low-latency data streams (using Kafka or Flink) to provide agents with "fresh" data, enabling them to act on real-time market or operational changes.
  • Evaluation Frameworks: Construct "Gold Datasets" and versioned data snapshots to help the team benchmark agent performance over time.
Required Skills & Qualifications
  • Experience: 10+ years in Data Engineering, with at least 2 years focusing on data for LLMs or AI/ML applications.
  • Python Mastery: Deep expertise in Python (Pandas, Pydantic, FastAPI) for data manipulation and API integration.
  • Data Tooling: Strong experience with modern data stack tools (e.g., dbt, Airflow, Dagster, Snowflake, or Databricks).
  • Vector Expertise: Hands-on experience with at least one major Vector Database and knowledge of similarity search algorithms (HNSW, Cosine Similarity).
  • Search Knowledge: Familiarity with hybrid search techniques (combining semantic search with traditional keyword search like Elasticsearch/BM25).
  • Cloud Infrastructure: Proficiency in managing data workloads on AWS, Azure, or Google Cloud Platform.
Preferred Qualifications
  • Experience with LlamaIndex or LangChain for data ingestion.
  • Knowledge of Graph Databases (e.g., Neo4j) to help agents understand complex relationships between data points.
  • Familiarity with "Data-Centric AI" principles prioritizing data quality over model size.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10462843
  • Position Id: 8948622
  • Posted 6 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Dallas, Texas

Today

Full-time

USD 90,000.00 - 150,000.00 per year

Dallas, Texas

Today

Full-time

Irving, Texas

Today

Easy Apply

Full-time

Remote or Irving, Texas

Today

Full-time

USD 111,800.00 - 186,400.00 per year

Search all similar jobs