Role : Sr. Data/GenAI Engineer
Location : Irving, TX (Hybrid)
• Project Mission: To lead the design and execution of the data ingestion and preparation pipeline, which forms the knowledge foundation for the AI system. This role is responsible for ensuring the AI has access to accurate, well-structured, and contextually rich metadata.
• Key Responsibilities:
o Architect and orchestrate the entire data and metadata ingestion process for the AI's knowledge base.
o Design and oversee the development of scripts and processes to extract schema definitions, query logs, business glossaries, and other metadata from source systems.
o Lead a team of data engineers in performing data transformation, cleansing, and formatting to prepare it for vectorization.
o Collaborate with the Architect to design the optimal data model for the vector database.
o Serve as the subject matter expert on data sources, liaising with data owners to understand structures, access patterns, and semantics.
o Ensure the data ingestion pipeline is robust, repeatable, and well-documented.
• Required Skills & Experience:
o Senior-level experience in data engineering, including designing and building complex ETL/ELT pipelines.
o Expert-level proficiency in SQL and Python for data processing and automation.
o Hands-on experience with vector databases (e.g., Pinecone, Chroma, PGvector) and the concept of embeddings.
o Experience working with a variety of data sources, from structured databases to semi-structured API outputs.
o Strong leadership and mentorship skills to guide a remote or distributed team.