Senior Enterprise Data Platform Data Engineer

Washington, DC, US • Posted 15 hours ago • Updated 3 hours ago

Contract W2

On-site

$Competitive

Fitment

Dice Job Match Score™

✨ Finding the perfect fit...

Job Details

Skills

SQL
Data Deduplication
Performance Tuning
Caching
Git
Command-line Interface
Lifecycle Management
Orchestration
Apache Kafka
Amazon Kinesis
Analytics
Problem Solving
Conflict Resolution
Debugging
Technical Communication
Test-driven Development
Dimensional Modeling
Snow Flake Schema
Amazon S3
PySpark
Real-time
Recovery
Layout
Python
R
Research
Management
Dashboard
Incident Management
Root Cause Analysis
Documentation
Version Control
Continuous Integration
Continuous Delivery
Automated Testing
Promotions
Extract
Transform
Load
Workflow
Optimization
Clustering
Testing
Data Quality
Data Modeling
Slowly Changing Dimensions
SCD
Analytical Skill
Collaboration
Unity
Meta-data Management
Data Engineering
Streaming
Apache Spark
Databricks
Amazon Web Services

Summary

Knowledge and experience Required:
Strong proficiency in Python and R for data engineering and analytical workflows.
Hands-on experience with Databricks and Apache Spark, including Structured Streaming (watermarking, stateful processing concepts, checkpointing, exactly-once/at-least-once tradeoffs).
Strong SQL skills for transformation and validation.
Experience building production-grade pipelines: idempotency, incremental loads, backfills, schema evolution, and error handling.
Experience implementing data quality checks and validation for both batch and event streams (late arrivals, deduplication, event-time vs processing-time).
Observability skills: logging/metrics/alerting, troubleshooting, and performance tuning (partitions, joins/shuffles, caching, file sizing).
Proficiency with Git and CI/CD concepts for data pipelines, Databricks asset bundling, Databricks application deployments, and proficiency using Databricks CLI
Experience with Lakehouse table formats and patterns (e.g., Delta tables) including compaction/optimization and lifecycle management.
Familiarity with orchestration patterns (Databricks Workflows/Jobs) and dependency management.
Experience with governance controls (catalog permissions, secure data access patterns, metadata/lineage expectations).
Knowledge of message/event platforms and streaming ingestion patterns (e.g., Kafka/Kinesis equivalents) and sink patterns for serving layers.
Experience collaborating with research/analytics stakeholders and translating analytical needs into engineered data products.
Strong problem-solving and debugging across ingestion transformation serving.
Clear technical communication and documentation discipline.
Ability to work across product/architecture/governance teams in a regulated environment.
Deep Delta Lake expertise including time travel, Change Data Feed (CDF), MERGE operations, CLONE, table constraints, and optimization techniques, understanding of liquid clustering and table maintenance best practices.
Experience with Lake flow/Delta Live Tables (DLT) including expectations framework, materialized vs. streaming table patterns, and declarative pipeline design.
Proficiency with testing frameworks (pytest, Great Expectations, deequ) and test-driven development practices for production data pipelines.
Data modelling skills including dimensional modeling (star/snowflake schemas), medallion architecture implementation, and slowly changing dimension (SCD) pattern
implementation.
AWS data services experience including S3 optimization, IAM role configuration for data access, and CloudWatch integration; understanding of cost optimization patterns.

Responsibilities:
Build and maintain end-to-end pipelines in Databricks using Spark (PySpark) for ingestion, transformation, and publication of curated datasets.
Implement streaming / near-real-time patterns using Spark Structured Streaming (or equivalent), including state management, checkpointing, and recovery.
Design incremental processing, partitioning strategies, and data layout/file sizing approaches to optimize performance and cost.
Develop reusable pipeline components (common libraries, parameterized jobs, standardized patterns) to accelerate delivery across domains.
Develop and operationalize workflows in Python and R for data preparation, analysis support, and research-ready extracts.
Package code for repeatable execution (dependency management, environment reproducibility, job configuration).
Implement data quality controls for batch and streaming (schema enforcement, completeness/validity checks, late/duplicate event handling, reconciliation).
Build pipeline observability: logging, metrics, alerting, and dashboards; support on call/incident response and root-cause analysis.
Create runbooks and operational procedures for critical pipelines and streaming services.
Ensure secure handling of sensitive data and apply least-privilege principles in pipeline design and execution.
Contribute lineage notes, dataset definitions, and operational documentation to support reuse and auditability.
Use version control and CI/CD practices for notebooks/code (code reviews, automated testing where feasible, deployment/promotion across environments).
Collaborate with stakeholders to refine requirements, define SLAs, and deliver incrementally with measurable outcomes.
Implement Lakeflow/Delta Live Tables (DLT) pipelines with data quality expectations, materialized views, and streaming tables; design pipeline DAGs and maintain declarative ETL workflows.
Design and implement medallion architecture patterns (Bronze/Silver/Gold) with appropriate data quality gates, schema evolution strategies, and layer-specific optimization techniques (OPTIMIZE, VACUUM, Z-ordering/liquid clustering).
Develop and maintain comprehensive testing strategies including unit tests for transformation logic, integration tests for end-to-end pipelines, and data quality validation using frameworks like Great Expectations or deequ.
Perform data modeling and schema design for dimensional models, slowly changing dimensions (SCD), and analytical structures; collaborate on entity definitions and grain decisions.
Contribute to Unity Catalog governance by registering datasets with metadata/descriptions/tags, implementing row/column-level security where required, and maintaining accurate lineage information.

Education / Experience/Certifications/Accreditations:
Bachelor's degree in a related field or equivalent experience.
10+ years of data engineering experience, including production Spark-based batch pipelines and streaming implementations.

Desirable Certifications:
Databricks Certified Apache Spark Developer Associate.
Databricks Certified Data Engineer Associate or Professional.
AWS Certified Developer Associate.
AWS Certified Data Engineer Associate.
AWS Certified Solution Architect Associate.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10217276
Position Id: 2025-510805/13494
Posted 15 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Senior Enterprise Data Platform Data Engineer

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs