Lead/Sr. Data Engineer + AI

Overview

Remote

$DOE

Accepts corp to corp applications

Contract - W2

Contract - 12 Month(s)

Skills

workflows

Amazon Web Services

PYSPARK

Data Science

data modelling

Data Pipelines

Continuous Integration

Data Quality

Microsoft Azure

Identity and Access management

cloud computing

Problem solving

Artificial Intelligence

governance

Data Streaming

Databricks

requirements analysis

Reliability

data lakes

Extract Transform Load (ETL)

Large Language Models

Python (Programming Language)

Apache Hive

Machine Learning Operations

SQL Databases

Information Engineering

Data Logging

Role-Based Access Control

Stock Control

Networking Skills

Catalyst (Software)

Cost Optimisation

Feature Engineering

Indexer

Software Coding

Job Details

Role: Sr./ Lead Data Engineer + AI
Location: Boston, MA - Remote
Experience Needed: 10 Years to 15 Years For Lead/ 05 to 10 Years for Senior

Need minimum 3 years of experience as Lead.

About the role:
We're looking for a Senior Data Engineer to build and scale our lakehouse and AI data pipelines on Databricks. You'll design robust ETL/ELT, enable feature engineering for ML/LLM use cases, and drive best practices for reliability, performance, and cost.

What you'll do:
Design, build, and maintain batch/streaming pipelines in Python + PySpark on Databricks (Delta Lake, Autoloader, Structured Streaming).
Implement data models (Bronze/Silver/Gold), optimize with partitioning, Z-ORDER, and indexing, and manage reliability (DLT/Jobs, monitoring, alerting).
Enable ML/AI: feature engineering, MLflow experiment tracking, model registries, and model/feature serving; support RAG pipelines (embeddings, vector stores).
Establish data quality checks (e.g., Great Expectations), lineage, and governance (Unity Catalog, RBAC).
Collaborate with Data Science/ML and Product to productionize models and AI workflows; champion CI/CD and IaC.
Troubleshoot performance and cost issues; mentor engineers and set coding standards.

Must-have qualifications:
10+ years in data engineering with a track record of production pipelines.
Expert in Python and PySpark (UDFs, Window functions, Spark SQL, Catalyst basics).
Deep hands-on Databricks: Delta Lake, Jobs/Workflows, Structured Streaming, SQL Warehouses; practical tuning and cost optimization.
Strong SQL and data modeling (dimensional, medallion, CDC).
ML/AI enablement experience: MLflow, feature stores, model deployment/monitoring; familiarity with LLM workflows (embeddings, vectorization, prompt/response logging).
Cloud proficiency on AWS/Azure/Google Cloud Platform (object storage, IAM, networking).

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share