Collibra Data Lineage Automation Engineer_McLean, Virginia

Overview

On Site
Accepts corp to corp applications
Contract - Independent
Contract - 6 month(s)
100% Travel

Skills

Extraction
Cloud Computing
Amazon Web Services
Relational Databases
NoSQL
Business Intelligence
Reporting
Tableau
Microsoft Power BI
Storage
Extract
Transform
Load
Mapping
Machine Learning (ML)
Natural Language Processing
Data Governance
Python
Scala
Java
SQL
Snow Flake Schema
Microsoft SQL Server
Oracle
MongoDB
Financial Services
Health Care
Real-time
Data Engineering
Meta-data Management
Artificial Intelligence

Job Details

Job Description

Title: Collibra Data Lineage Automation Engineer

Duration: 6+ Months

Location: Role is on-site 5 days/week in McLean, VA.

We are seeking a highly experienced Data Lineage Automation Engineer to lead the design and implementation of automated end-to-end lineage solutions across a highly heterogeneous enterprise data ecosystem. This role requires deep technical expertise in lineage frameworks (such as Spline and OpenLineage), experience across cloud and legacy environments, and a strong AI foundation to support intelligent metadata extraction and traceability.

Key Responsibilities

Lead the implementation of automated data lineage across a complex data estate that includes: o Cloud platforms (e.g., Snowflake, AWS)

o Legacy relational databases and ETLs

o NoSQL data stores o BI/reporting platforms (e.g., Tableau, Power BI)

Implement or extend frameworks such as Spline, OpenLineage, or similar open frameworks to support active lineage capture

Build connectors, extractors, or agents where necessary to bridge gaps between systems and lineage frameworks

Integrate with metadata platforms (e.g., Collibra) to publish lineage in a consumable format

Apply AI/ML techniques to infer lineage where automation is incomplete (e.g., handling Java based ETLs), using logs, query patterns, or usage metadata

Develop reusable lineage components for operational reuse across domains

Guide stakeholders on best practices for lineage standardization, storage, and use

Required Skills & Experience

Proven experience delivering automated data lineage solutions across hybrid architectures

Hands-on expertise with Spline, OpenLineage, Marquez, or comparable lineage frameworks

Deep understanding of metadata capture, ETL process tracing, and query execution mapping

Strong AI/ML background - particularly in metadata intelligence, natural language processing for code parsing, or pattern detection

Experience integrating lineage with data governance tools (e.g., Collibra, Alation, etc.)

Strong programming background in Python, Scala, or Java

Deep familiarity with SQL and query logs from systems like Snowflake, SQL Server, Oracle, MongoDB, etc.

Big Plus Skills

Experience with third-party commercial data lineage solutions a plus (evaluations and implementations)

Prior work in regulated environments (e.g., financial services, healthcare)

Familiarity with event-based architectures for real-time lineage propagation

Knowledge of data mesh or domain-driven lineage strategies

Ideal Candidate

Has successfully implemented automated lineage at enterprise scale

Operates at the intersection of data engineering, metadata management, and AI

Can act as a technical thought partner to architecture teams and governance leads

Brings the mindset of automation-first and reuse-oriented design

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.