Data Engineer

Overview

Remote
$50 - $60
Full Time

Skills

API
Apache Spark
Big Data
Cloud Computing
Collaboration
Communication
Continuous Delivery
Agile
Amazon Redshift
Amazon Web Services
Apache Hadoop
Data Quality
Apache Kafka
Continuous Integration
Data Engineering
Data Governance
Extract
Transform
Load
Data Lake
Data Security
DevOps
Documentation
Engineering Support
Health Informatics
JSON
Mentorship
Meta-data Management
Microservices
GitHub
Good Clinical Practice
Google Cloud Platform
PL/SQL
Pandas
HIPAA
Health Care
Microsoft Azure
Migration
MySQL
Orchestration
Performance Tuning
PostgreSQL
Public Health
Python
R
RabbitMQ
Real-time
Regulatory Compliance
SAS
SQL
Scrum
Snow Flake Schema
Stored Procedures
Streaming
System On A Chip
Terraform
Testing
Workflow
XML

Job Details

Comtech is seeking a Data Engineer who is responsible for evaluating and implementing the technical
integration between LINKS and the Data Lake. This includes coordinating with OPH and STChealth on
data standards, building and optimizing ETL pipelines, and integrating related datasets like Vital Records
and lab data. The position focuses on creating scalable, high-quality data workflows with strong
governance, observability, and lineage tracking. It also involves mentoring engineering staff, supporting
testing, enforcing architectural best practices, and clearly documenting and communicating all technical
solutions to both technical and non-technical stakeholders.

The scope of the proposed services will include the following:
Assess feasibility and technical requirements for LINKS DataLake integration.
Collaborate with OPH Immunization Program, OPH Bureau of Health Informatics and
STChealth on data specifications and recurring ingestion pipelines.
Build and optimize ETL workflows for LINKS and complementary datasets (Vital
Records, labs, registries).
Design scalable data workflows to improve data quality, integrity, and identity resolution.
Implement data governance, observability, and lineage tracking across all pipelines.
Mentor engineers, support testing, and enforce best practices in orchestration and architecture.
Document and communicate technical solutions to technical and non-technical stakeholders.

5. Expertise and/or Relevant Experience:
Expertise and/or relevant experience in the following areas are mandatory:
M1 3 years of experience in data engineering and/or data architecture
M2 2 years of experience with Python for ETL and automation (pandas, requests, API
integration).
M3 2 years hands-on experience with SQL queries, stored procedures, performance tuning
(preferable Oracle, SQL Server, MySQL)
M4 1 year experience with ETL orchestration tools (Prefect, Airflow or equivalent).
M5 1 year experience with cloud platforms (Azure, AWS, or Google Cloud Platform), including data
onboarding/migration.
M6 1 year exposure to data lake / medallion architecture (bronze, silver, gold)
M7 2 years of experience providing written documentation and verbal communication for
crossfunctional collaboration.

Expertise and/or relevant experience in the following areas are desirable but not mandatory:
D1 5+ years of experience in data engineering roles
D2 Experience integrating or developing REST/JSON or XML APIs
D3 Familiarity with CI/CD pipelines (GitHub Actions, Azure DevOps, etc.).
D4 Exposure to Infrastructure as Code experience (Terraform, CloudFormation).
D5 Experience with data governance and metadata tools (Atlan, OpenMetadata, Collibra).
D6 Public health/healthcare dataset or similar experience, including PHI/PII handling.
D7 Familiarity with SAS and R workflows to support epidemiologists and analysts.
D8 Experience with additional SQL platforms (Postgres, Snowflake, Redshift, BigQuery).
D9 Familiarity with data quality frameworks (Great Expectations, Deequ).
D10 Experience with real-time/streaming tools (Kafka, Spark Streaming).
D11 Familiarity with big data frameworks for large-scale transformations (Spark, Hadoop).
D12 Knowledge of data security and compliance frameworks (HIPAA, SOC 2, etc.).
D13 Agile/SCRUM team experience.

Expertise and/or relevant experience in the following areas are mandatory:
M1 3 years of experience in data engineering and/or data architecture
M2 2 years of experience with Python for ETL and automation (pandas, requests, API
integration).
M3 2 years hands-on experience with SQL queries, stored procedures, performance tuning
(preferable Oracle, SQL Server, MySQL)
M4 1 year experience with ETL orchestration tools (Prefect, Airflow or equivalent).
M5 1 year experience with cloud platforms (Azure, AWS, or Google Cloud Platform), including data
onboarding/migration.
M6 1 year exposure to data lake / medallion architecture (bronze, silver, gold)
M7 2 years of experience providing written documentation and verbal communication for
crossfunctional collaboration.


D16 Experience working in Agile environments with iterative development practices.
D17 Experience with cloud platforms such as Microsoft Azure or AWS.
D18 Familiarity with API integrations for external data sources.
D19 Exposure to modular or microservices architectures.
D20 Knowledge of message-based systems (e.g., RabbitMQ) and asynchronous programming
models.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.