Role: Lead / Principal Data Engineer
Duration: LongTerm W2 Contract
Dallas, TX- Onsite Local Candidates Only
Position summary
We are seeking an experienced Lead or Principal Data Engineer to join a longterm W2 contract engagement based in Dallas, TX.
This is an onsite role for local candidates who can provide handson technical leadership and own the design, implementation, and operational excellence of largescale data platforms.
The ideal candidate has deep experience with Databricks and Scala, strong mastery of Spark performance tuning, and a proven track record building metadatadriven, governable data architectures (Medallion architecture preferred) that balance scalability and cost.
Key responsibilities
Architect and lead implementation of a Medallion data architecture that optimizes for scalability, performance, maintainability, and cost-efficiency on Databricks.
Design and implement efficient ingestion pipelines, including handling sparse column ingestion patterns and change-data-capture (CDC) scenarios and edge cases.
Lead Spark and Databricks performance optimization: analyze job profiles, optimize joins, shuffles, partitioning, caching, and resource configurations to reduce latency and cost.
Build metadatadriven frameworks for pipeline orchestration, schema evolution, data quality checks, and automated recovery from failures.
Implement and enforce data governance using Unity Catalog and other governance tools: access controls, lineage, classification, and auditability.
Design resilient distributed systems with automated failure detection and recovery strategies; investigate and remediate distributed system failures and stability issues.
Implement crossaccount AWS integrations securely and reliably (S3, IAM roles, KMS, VPC endpoints, Glue/Glue Catalog interoperability where applicable).
Collaborate with data scientists, analytics, DevOps, and security teams to translate business requirements into performant data solutions and SLAs.
Mentor engineers, conduct code and architecture reviews, and set best practices for Scala, Spark, and Databricks development.
Create runbooks, monitoring dashboards, and operational playbooks to support 24x7 production reliability and incident response.
Required qualifications
15+ years of handson data engineering experience; 5+ years in a lead or principal role designing and operating production data platforms.
Extensive experience with Databricks and Apache Spark, including production job tuning, cluster sizing, and cost optimization.
Strong proficiency in Scala for data processing; experience with Python/PySpark is a plus.
Deep understanding of Medallion architecture patterns (bronze/silver/gold layers) and how to implement them in cloud data platforms.
Proven experience handling sparse column ingestion issues, schema drift, and CDC edge cases (Debezium/Kafka or vendor CDC solutions experience is a plus).
Experience building metadatadriven frameworks for schema management, pipeline orchestration (Airflow, Databricks Jobs, or similar), and automated testing.
Solid knowledge of data governance and security: Unity Catalog, IAM, RBAC, encryption at rest/in transit, and data lineage.
Strong AWS experience: S3 lifecycle policies, crossaccount access, IAM role assumptions, KMS, VPC endpoints, and Glue/Glue Catalog integration.
Demonstrated ability to design for distributed system resiliency and troubleshoot complex failures across clusters and networks.
Excellent communication skills; experience working directly with stakeholders and leading technical discussions.