Project Overview:
Responsible for designing, building, and maintaining data pipelines and infrastructure to support data-driven decisions and analytics. The individual is responsible for the following tasks:
· Design, develop and maintain data pipelines, and extract, transform, load (ETL) processes to collect, process and store structured and unstructured data
· Build data architecture and storage solutions, including data lakehouses, data lakes, data warehouse, and data marts to support analytics and reporting
· Develop data reliability, efficiency, and qualify checks and processes
· Prepare data for data modeling
· Monitor and optimize data architecture and data processing systems
· Collaboration with multiple teams to understand requirements and objectives
· Administer testing and troubleshooting related to performance, reliability, and scalability
· Create and update documentation
Key Responsibilities
Hands-On Data Pipeline Development
· Design, code, and deploy ETL/ELT pipelines across bronze, silver, and gold layers of the Data Lakehouse.
· Build ingestion pipelines for structured (SQL), semi-structured (JSON, XML), and unstructured data using PySpark/Python programming language using AWS Glue or EMR.
· Implement incremental loads, deduplication, error handling, and data validation.
· Actively troubleshoot, debug, and optimize pipelines for scalability and cost efficiency.
EDW & Data Lake Implementation
· Develop dimensional data models (Star Schema, Snowflake Schema) for analytics and reporting.
· Build and maintain tables in Iceberg, Delta Lake, or equivalent OTF formats.
· Optimize partitioning, indexing, and metadata for fast query performance.
Healthcare Data Integration
· Build ingestion and transformation pipelines for EDI X12 transactions (837, 835, 278, etc.).
· Implement mapping and transformation of EDI data with FHIR and HL7 frameworks.
· Work hands-on with AWS Health Lake (or equivalent) to store and query healthcare data.
Data Quality, Security & Compliance
· Develop automated validation scripts to enforce data quality and integrity.
· Implement IAM roles, encryption, and auditing to meet HIPAA and CMS compliance standards.
· Maintain lineage and governance documentation for all pipelines.
Collaboration & Delivery
· Work closely with the Lead Data Engineer, analysts, and data scientists to deliver pipelines that support enterprise-wide analytics.
· Actively contribute to CI/CD pipelines, Infrastructure-as-Code (IaC), and automation.
· Continuously improve pipelines and adopt new technologies where appropriate.
Required Skills & Qualifications
The candidate should have experience as a data engineer or in a similar role, with a strong understanding of data architecture and ETL processes. The candidate should be proficient in programming languages for data processing and knowledgeable about distributed computing and parallel processing.
· 3+ years of hands-on experience in building, deploying, and maintaining data pipelines on AWS or equivalent cloud platforms.
· Strong coding skills in Python and SQL (Scala or Java a plus).
· Proven experience with Apache Spark (PySpark) for large-scale processing.
· Hands-on experience with AWS Glue, S3, Redshift, Athena, EMR, Lake Formation.
· Strong debugging and performance optimization skills in distributed systems.
· Hands-on experience with Iceberg, Delta Lake, or other OTF table formats.
· Experience with Airflow or other pipeline orchestration frameworks.
· Practical experience in CI/CD and Infrastructure-as-Code (Terraform, CloudFormation).
· Practical experience with EDI X12, HL7, or FHIR data formats.
· Strong understanding of Medallion Architecture for data lake houses.
· Hands-on experience building dimensional models and data warehouses.
· Working knowledge of HIPAA and CMS interoperability requirements.
Education:
This position requires a bachelor’s or master’s degree from an accredited college or university with a major in computer science, statistics, mathematics, economics, or a related field. Three (3) years of equivalent experience in a related field may be substituted for the Bachelor’s degree.