Job Description :
Specialized experience: The candidate should have experience as data engineer or similar role with a strong understanding of data architecture and ETL processes. The candidate should be proficient in programming languages for data processing and knowledgeable of distributed computing and parallel processing.
3+ years hands-on experience in building, deploying, and maintaining data pipelines on AWS or equivalent cloud platforms.
Strong coding skills in Python and SQL (Scala or Java a plus).
Proven experience with Apache Spark (PySpark) for large-scale processing.
Hands-on experience with AWS Glue, S3, Redshift, Athena, EMR, Lake Formation.
Strong debugging and performance optimization skills in distributed systems.
Hands-on experience with Iceberg, Delta Lake, or other OTF table formats.
Experience with Airflow or other pipeline orchestration frameworks.
Practical experience in CI/CD and Infrastructure-as-Code (Terraform, CloudFormation).
Practical experience with EDI X12, HL7, or FHIR data formats.
Strong understanding of Medallion Architecture for data lake houses.
Hands-on experience building dimensional models and data warehouses.
Working knowledge of HIPAA and CMS interoperability requirements.