Overview
Skills
Job Details
Data Engineer
Location: Remote (Louisiana)
Contract Duration: 12 months, with potential extension
Position Overview
Seeking an experienced Data Engineer to support data integration, pipeline development, governance, and architecture initiatives across multiple public health data systems. This role involves building scalable ETL workflows, enhancing data quality, supporting cloud migrations, and collaborating with cross-functional teams on data-driven solutions.
Key Responsibilities
Assess feasibility and technical requirements for integrating core systems with a centralized Data Lake.
Collaborate with internal teams and external partners on data specifications, ingestion workflows, and recurring pipelines.
Build, maintain, and optimize ETL processes for datasets including immunization systems, vital records, laboratory data, and registries.
Design scalable and efficient workflows to enhance data quality, consistency, and identity matching.
Implement data governance practices, observability, and lineage tracking across all pipelines.
Mentor engineering staff, support testing processes, and enforce best practices in data orchestration and architecture.
Document and communicate technical solutions for both technical and non-technical stakeholders.
Minimum Required Qualifications
3 years experience in data engineering or data architecture.
2 years Python experience for ETL/automation (pandas, requests, API integrations).
2 years strong SQL experience (queries, stored procedures, performance tuning).
1 year experience with ETL orchestration tools (Airflow, Prefect, or equivalent).
1 year experience with cloud platforms (Azure, AWS, or Google Cloud Platform) including onboarding/migration work.
1 year exposure to Data Lake / Medallion architecture (bronze/silver/gold layers).
2 years experience producing clear technical documentation and collaborating cross-functionally.
Preferred (Not Required)
5+ years experience in data engineering.
Experience integrating or developing REST/JSON or XML APIs.
Familiarity with CI/CD tools (GitHub Actions, Azure DevOps, etc.).
Exposure to Infrastructure as Code (Terraform, CloudFormation).
Experience with data governance/metadata platforms (Atlan, Collibra, OpenMetadata).
Experience with PHI/PII datasets in public health or healthcare domains.
Familiarity with SAS or R workflows used by analysts/epidemiologists.
Experience with additional SQL-based platforms (Postgres, Snowflake, Redshift, BigQuery).
Knowledge of data quality frameworks (Great Expectations, Deequ).
Experience with streaming technologies (Kafka, Spark Streaming).
Experience with large-scale processing frameworks (Spark, Hadoop).
Knowledge of compliance/security standards (HIPAA, SOC 2, etc.).
Experience working in Agile/Scrum environments.