Overview
Skills
Job Details
Job Title: Lead AWS Data Engineer with Python
Duration: 12+ Months
Location: Houston, TX (Onsite)
About the Role
We are seeking a lead Data Engineer (Level 3) to design, build, and optimize large-scale, high-reliability data pipelines and lakehouse architectures. The ideal candidate combines deep data engineering expertise with strong software engineering fundamentals to deliver modular, scalable, and testable data systems. This role involves leading core architectural decisions and end-to-end patterns across ingestion, transformation, data modeling, and delivery, including partitioning strategies and partition key design for high-performance analytics.
Key Responsibilities
- Design, build, and maintain ELT pipelines across ingestion, transformation, modeling, and delivery layers (bronze silver gold).
- Implement incremental loads, change-data-capture (CDC), merge/upsert, and idempotent pipeline patterns to ensure reliability and repeatability.
- Define and apply data architectural patterns (e.g., layered lakehouse, domain-oriented datasets, and semantic models) aligned to business objectives.
- Engineer physical data designs including partitioning strategies, partition key selection, clustering/micro-partitioning, and compaction for performance and cost efficiency.
- Develop curated datasets and data marts that enable analytics and self-service BI.
- Implement data quality, observability, and lineage (validations, profiling, SLAs, monitoring, and alerting).
- Optimize performance on cloud data platforms (e.g., Snowflake tasks/streams, compute sizing, query optimization).
- Design and manage Lakehouse table formats (e.g., Apache Iceberg or Delta) on object storage including schema evolution and maintenance.
- Collaborate with Data Architects, Analytics Engineering, and business stakeholders to translate requirements into scalable data solutions.
- Mentor junior engineers, lead design reviews, and contribute to engineering standards and reusable frameworks.
- Automate and optimize the data lifecycle using CI/CD and infrastructure-as-code; apply DevOps principles to data pipelines.
Required Qualifications
- 14+ years of experience in Data Engineering or closely related Software Engineering roles with a data focus.
- Expert-level SQL development and data analysis skills, including advanced query optimization and debugging.
- Strong Python engineering skills and familiarity with software design principles and patterns (e.g., SOLID), unit testing, refactoring, and version control.
- Hands-on experience building ELT/ETL pipelines and orchestration with tools such as Astronomer/Airflow; proficiency with Git and CI/CD.
- Deep understanding of core data engineering patterns: ingestion, transformation, modeling (dimensional/SCDs), and delivery.
- Proven experience with database physical design including partitioning and effective partition key selection; exposure to clustering and micro-partitioning on MPP/cloud data platforms.
- Experience implementing data quality frameworks, observability/monitoring, and robust operational SLAs.
- Experience with Lakehouse table formats (Apache Iceberg/Delta/Hudi) and columnar storage (Parquet) on object storage (e.g., AWS S3).
- Strong communication skills with the ability to present complex technical concepts to both technical and business stakeholders.
Preferred Qualifications
- Experience optimizing Snowflake workloads (compute sizing, tasks/streams, clustering, micro-partitioning).
- Experience with dbt, Data Build Tool, or similar for transformation and testing.
- Experience with event streaming (Kafka/Kinesis/Flink) and API-based data integration.
- Experience with data catalog, governance, and lineage platforms.
Core Competencies
- Architectural thinking and systems design.
- Structured problem-solving and analytical rigor.
- Clear written and verbal communication; stakeholder engagement.
- Bias for automation, reliability, and maintainability.
Tools & Technologies (representative)
- Databases & Warehouses: Snowflake, MPP databases; dimensional modeling/SCDs.
- Lakehouse & Storage: Apache Iceberg/Delta/Hudi, Parquet, AWS S3/Object Storage.
- Orchestration & CI/CD: Astronomer/Airflow, git, CI/CD pipelines.
- Programming: Python, SQL.
- Observability & Quality: data validation frameworks, monitoring/alerting tools.
Education & Work Conditions
Bachelor s degree in Computer Science, Data Engineering, Information Systems, or related field; advanced degree a plus.
Location: Houston, TX (in-office, no remote/hybrid).