Job Title: Data Engineering Lead
Location: Pittsburgh, PA / Dallas, TX / Cleveland, OH (Hybrid)
Role Overview
We are seeking an experienced Data Engineering Lead to own the design, development, and delivery of enterprise-scale data pipelines and platforms within a large financial services environment. This role combines deep hands-on engineering expertise with strong project leadership capabilities — you will drive end-to-end delivery of data engineering workstreams while serving as the primary point of contact for business stakeholders, BSAs, product owners, and cross-functional delivery teams.
You will lead a team of data engineers and QA analysts, manage delivery timelines, govern quality standards, and translate complex business requirements into scalable technical solutions built on PySpark, Informatica IDMC, SQL, Hadoop, and Python.
Key Responsibilities
Data Pipeline Design & Engineering
• Design, develop, and optimize large-scale batch and near-real-time data pipelines using PySpark and Spark on OCP/Kubernetes in a Hadoop ecosystem.
• Build and maintain robust ETL/ELT workflows using Informatica IDMC (mappings, mapping tasks, taskflows, DQ rules) aligned to enterprise data standards.
• Develop reusable Python-based transformation utilities, data quality frameworks, and automation scripts to accelerate pipeline delivery.
• Write complex SQL for data transformations, validation, reconciliation, and performance tuning across Teradata, Hive, and ANSI-compliant databases.
• Implement medallion architecture patterns (Bronze / Silver / Gold) ensuring traceability, quality, and auditability at each layer.
• Integrate data pipelines with enterprise platforms including Kafka event streams, REST APIs, and file-based ingestion channels.
Project Leadership & Delivery Management
• Lead end-to-end delivery of data engineering workstreams from requirements intake through production deployment, managing scope, timelines, and risk.
• Own sprint planning, backlog grooming, and delivery governance in an Agile/Scrum model — coordinating onshore and offshore team members.
• Maintain and enforce delivery checklists, change request (CR) governance, and release cadence standards (e.g., bi-weekly release cycles, 10-day CR windows).
• Proactively identify delivery blockers, escalate risks, and drive resolution across engineering, QA, platform, and business stakeholders.
• Prepare and present delivery status, pipeline architecture overviews, and milestone updates to senior leadership and program sponsors.
Stakeholder Engagement & Business Partnership
• Serve as the primary technical liaison for Product Owners, BSAs, and business domain leads — translating requirements into technical designs and driving sign-off.
• Collaborate with data architects to ensure pipeline implementations align with enterprise architecture standards, data contracts, and governance policies.
• Facilitate data requirement workshops, technical walkthroughs, and design reviews with both technical and non-technical audiences.
• Communicate pipeline health, data quality metrics, and operational issues clearly to stakeholders across business and technology functions.
• Partner with data governance, security, and risk teams to ensure regulatory compliance (e.g., BCBS 239, data lineage, audit trails).
Team Leadership & Mentorship
• Lead a team of data engineers (onshore and offshore) and QA analysts, providing technical direction, code reviews, and hands-on mentorship.
• Define and enforce engineering standards — coding conventions, unit test coverage, CI/CD integration, and documentation practices.
• Drive a culture of quality and continuous improvement through retrospectives, root-cause analysis, and iterative process refinement.
• Support team capacity planning, onboarding, and skill development aligned to the project technology stack.
• Coordinate QA lead and QA team activities to ensure comprehensive test coverage, defect triage, and UAT readiness.
Data Quality, Observability & Governance
• Embed data quality checks (Great Expectations or equivalent) at ingestion, transformation, and output layers of all pipelines.
• Implement lineage tracking and metadata cataloging (e.g., Alation) to support governance and auditability requirements.
• Monitor pipeline health using observability tooling, define SLAs for data freshness and quality, and manage incident resolution.
• Enforce data access controls and masking/tokenization standards in collaboration with security and compliance teams (e.g., Protegrity).
Required Qualifications
• 8+ years of experience in data engineering, with at least 3 years in a lead or senior technical role.
• Proven track record delivering complex data pipeline projects in financial services or similarly regulated industries.
• Strong leadership and communication skills with demonstrated experience managing cross-functional delivery teams.
• Experience working directly with BSAs, product owners, and business stakeholders in an Agile delivery model.
• Bachelor''s degree in Computer Science, Engineering, Information Systems, or a related field.
Technical Skills
Category Technologies & Skills
Core Processing PySpark, Apache Spark (Standalone, OCP/Kubernetes), Spark Streaming, Spark SQL
ETL & Integration Informatica IDMC (mappings, taskflows, DQ rules), REST API ingestion, Kafka-based pipelines
Languages Python (pandas, PySpark, automation scripting), SQL (Teradata, Hive, ANSI SQL)
Data Platforms Hadoop (HDFS, YARN, Hive), Teradata, Hive Metastore, lakehouse architectures
Data Quality & Governance Great Expectations (or equivalent), Alation, Protegrity, data lineage, metadata management
DevOps & Delivery Git, CI/CD pipelines, Agile/Scrum, JIRA, release governance, CR management
Observability Pipeline monitoring, ELK stack (preferred), alerting and SLA management
Visualization & Reporting Executive status reporting, architecture diagrams (draw.io), delivery dashboards
Preferred Qualifications
• Banking or financial services experience (risk data, core banking, deposits, transactions, regulatory reporting).
• Familiarity with BCBS 239 or similar regulatory data compliance frameworks.
• Experience with graph databases (Neo4j) for relationship-centric data modeling.
• Exposure to cloud platforms (Azure, AWS, or Google Cloud Platform) and hybrid on-prem/cloud architectures.
• Experience with data mesh or domain-oriented operating models.
• Knowledge of data catalog and data lineage tools (Alation, Collibra, or similar).
• PMP, PMI-ACP, or equivalent project/program management certification.