Role: Gen AI Data Engineer
Location: Hartford, CT (Remote)
We are seeking an experienced Data Engineer with expertise in Dataiku to join our data team. As a Data Engineer, you will be responsible for designing, building, and maintaining data pipelines, data integration processes, and data infrastructure. You will collaborate closely with data scientists, analysts, and other stakeholders to ensure efficient data flow and support data-driven decision making across the organization.
Requirements
Design and implement robust data pipelines that ingest, process, and store unstructured data formats at scale within Snowflake and Google Cloud Platform
Leverage Snowflake's unstructured data capabilities (Directory Tables, Scoped URLs, Snowpark) to make "dark data" queryable and actionable
Build and maintain cloud-native ETL/ELT processes using BigQuery, Cloud Storage, and Dataflow, ensuring seamless integration between Google Cloud Platform and Snowflake
Instead of just using LLMs, you will integrate AI tools (OCR, NLP entities, Document AI) into the engineering flow to transform unstructured blobs into structured insights
Tune complex SQL queries and Python-based processing jobs to handle petabyte-scale environments efficiently