Senior Databricks Data Engineer

Remote • Posted 2 days ago • Updated 1 day ago
Full Time
Remote
$DOE
Fitment

Dice Job Match Score™

👤 Reviewing your profile...

Job Details

Skills

  • SQL
  • data modeling
  • Performance Tuning
  • PySpark
  • databricks
  • watermarking
  • CDC
  • marketing datasets

Summary

Role: Senior Databricks Data Engineer
Location: Remote
Duration: 4-8 Weeks + Possible extension
Description:
We are looking to immediately onboard a Senior Databricks Data Engineer to support a high-priority initiative delivering data to IPSOS for MMM/MTA modeling.
We are building data pipelines to extract and deliver curated datasets (approx. 150GB historical + weekly increments) from Databricks bronze/silver layers to an external analytics partner (IPSOS). The data will be used for MMM/MTA modeling, so accuracy, consistency, and reliability are critical.

Key Responsibilities:
Data Extraction & Engineering
Build scalable extraction pipelines from Databricks (bronze/silver layers)
Prepare datasets for external consumption (column selection, renaming, formatting, normalization)
Work across ~10 20 fact and dimension tables spanning media and sales domains

Incremental Pipeline Development
Design and implement incremental logic using timestamps or CDC patterns
Optimize for ongoing weekly loads (~2GB) while supporting large historical extracts

File Generation & Optimization
Generate export-ready datasets in CSV/Parquet formats
Implement partitioning strategies for performance (e.g., by date/source)
Apply compression and optimize file sizes for transfer

Data Validation & Quality
Implement validation checks (schema, row counts, completeness)
Troubleshoot data inconsistencies across multiple sources

Secure Delivery
Support secure file delivery (e.g., SFTP, encryption)
Implement monitoring, logging, retry logic, and failure notifications

Collaboration
Work closely with internal teams and IPSOS for data validation and issue resolution
Support onboarding and early-stage troubleshooting

Required Skills
Strong hands-on experience with Databricks (Delta Lake, notebooks, jobs)
Proficiency in PySpark and SQL for large-scale data processing
Experience with incremental pipelines (CDC, watermarking)
Solid understanding of data modeling (fact/dimension, grain alignment)
Experience handling large datasets (100GB+) and performance tuning
Familiarity with file-based delivery (CSV/Parquet) and secure transfer (SFTP, encryption)

Nice to Have
Experience with MMM/MTA or marketing datasets (Google, Meta, Amazon, etc.)
Experience working with external analytics partners (e.g., IPSOS, Nielsen)
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 91071031
  • Position Id: 2026-13947
  • Posted 2 days ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote

Today

Easy Apply

Full-time

65 - 75

Remote

6d ago

Easy Apply

Contract

Depends on Experience

Remote or Columbus, Ohio

Today

Full-time

USD 106,500.00 - 177,500.00 per year

Remote or New Jersey

Yesterday

Easy Apply

Full-time

$$150000

Search all similar jobs