Overview
Skills
Job Details
Data Engineer With AI Data Pipelines
Location : Remote
Duration : 6+ Months
Job Description :
Need 10 years of experience in designing and delivering distributed systems for large-scale data processing and transformation.
experience in data lake development and ingestion pipelines from diverse data sources.
Hands-on work in AWS advance services such as AWS Glue, AWS Entity Resolution, AWS Comprehend, Amazon SageMaker
Advanced AI & Data Intelligence
Partner with business and data science teams to gather requirements for AI tagging of study questions and patient responses.
Collaborate with compliance teams to ensure consent and opt-out logic is integrated into patient data pipelines.
Hands-on work with NoSQL DB (preferred Amazon Document DB) and IaC tools like Terraform.
Experience in CI/CD pipeline development with tools such as GitHub Actions and Jenkins.
Proficiency in Python or PySpark for high-performance data processing.
Experience in event-driven, serverless architectures for cloud-based distributed systems.
Strong ability to design, orchestrate, and schedule jobs using Airflow.
Strong proficiency in SQL, Python, and ETL/ELT tools (e.g., Airflow, dbt, Informatica, or equivalent).
Hands-on experience with CDC implementations and real-time/streaming data (Kafka, Spark Streaming, etc.).
Experience with cloud data platforms (AWS, Google Cloud Platform, or Azure Redshift, BigQuery, Snowflake, etc.).
Knowledge of data modeling, mapping, data transformation and schema design.
Understanding of data governance, security, and compliance in healthcare or regulated industries.
Please share profile's to sudheer at anveta dot com