Overview
Remote
On Site
Hybrid
Depends on Experience
Contract - W2
Skills
data engineering
data architecture
Python
ETL
SQL
SQL queries
stored procedures
performance tuning
racle
SQL Server
MySQL
Azure
AWS
GCP
data lake
medallion architecture
API
Agile
Amazon Redshift
Amazon Web Services
Apache Hadoop
Apache Kafka
Apache Spark
Big Data
Cloud Computing
Data Governance
Data Quality
DevOps
PL/SQL
R
Public Health
PostgreSQL
Scrum
SAS
Job Details
The scope of the proposed services will include the following:
- Assess feasibility and technical requirements for LINKS DataLake integration.
- Collaborate with OPH Immunization Program, OPH Bureau of Health Informatics and STChealth on data specifications and recurring ingestion pipelines.
- Build and optimize ETL workflows for LINKS and complementary datasets (Vital Records, labs, registries).
- Design scalable data workflows to improve data quality, integrity, and identity resolution.
- Implement data governance, observability, and lineage tracking across all pipelines.
- Mentor engineers, support testing, and enforce best practices in orchestration and architecture.
- Document and communicate technical solutions to technical and non-technical stakeholders.
Expertise and/or relevant experience in the following areas are mandatory:
- 3 years of experience in data engineering and/or data architecture
- 2 years of experience with Python for ETL and automation (pandas, requests, API integration).
- 2 years hands-on experience with SQL queries, stored procedures, performance tuning (preferable Oracle, SQL Server, MySQL)
- 1 year experience with ETL orchestration tools (Prefect, Airflow or equivalent).
- 1 year experience with cloud platforms (Azure, AWS, or Google Cloud Platform), including data onboarding/migration.
- 1 year exposure to data lake / medallion architecture (bronze, silver, gold)
- 2 years of experience providing written documentation and verbal communication for cross functional collaboration.
Expertise and/or relevant experience in the following areas are desirable but not mandatory:
- 5+ years of experience in data engineering roles
- Experience integrating or developing REST/JSON or XML APIs
- Familiarity with CI/CD pipelines (GitHub Actions, Azure DevOps, etc.).
- Exposure to Infrastructure as Code experience (Terraform, CloudFormation).
- Experience with data governance and metadata tools (Atlan, OpenMetadata, Collibra).
- Public health/healthcare dataset or similar experience, including PHI/PII handling.
- Familiarity with SAS and R workflows to support epidemiologists and analysts.
- Experience with additional SQL platforms (Postgres, Snowflake, Redshift, BigQuery).
- Familiarity with data quality frameworks (Great Expectations, Deequ).
- Experience with real-time/streaming tools (Kafka, Spark Streaming).
- Familiarity with big data frameworks for large-scale transformations (Spark, Hadoop).
- Knowledge of data security and compliance frameworks (HIPAA, SOC 2, etc.).
- Agile/SCRUM team experience.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.