Job Description
Full-time, hybrid opportunity for a Principal Data Engineer based in Boston, MA (3 days/week onsite in Back Bay). This role offers the opportunity to build and scale the data infrastructure behind a leading healthcare intelligence platform, leveraging modern cloud technologies, distributed systems, and large-scale data engineering to help life sciences organizations make faster, smarter decisions.
This is a highly impactful role for an experienced data engineer who wants to own critical data infrastructure at a rapidly growing startup. You'll work on complex healthcare and regulatory datasets, build scalable pipelines, and help shape the next generation of data products used by pharmaceutical and medical device companies. The ideal candidate thrives in fast-moving environments, enjoys solving difficult technical challenges, and wants to make a meaningful impact on healthcare innovation.
Required Skills & Experience
5+ years of professional experience in Data Engineering or a related field
Strong experience with data modeling, schema design, and scalable data pipelines
Experience working with relational databases, MongoDB, and Elasticsearch
Experience with Apache Airflow and workflow orchestration
Experience with Infrastructure as Code tools including Docker, Terraform, Terragrunt, Helm, and Kubernetes
Experience with Apache Spark and distributed computing frameworks
Experience building solutions on cloud platforms
Strong commitment to software engineering best practices and code quality
Interest in or experience working with Generative AI technologies
Desired Skills & Experience
Experience with Snowflake
Knowledge of modern security practices and data governance
Experience working with healthcare, life sciences, pharmaceutical, or medical device data
Startup experience and comfort operating in fast-paced environments
Experience building highly scalable and reliable data platforms
What You Will Be Doing
Tech Breakdown
45% Data Engineering & ETL Development
25% Cloud Infrastructure & Platform Engineering
20% Distributed Computing & Data Processing
10% Data Architecture & Governance
Daily Responsibilities
85% Hands On
0% Management Duties
15% Team Collaboration
Design, build, and maintain scalable ETL pipelines for healthcare and regulatory datasets
Integrate new data sources as platform capabilities continue to expand
Optimize pipeline performance, reliability, and scalability
Ensure data accuracy, consistency, and quality across complex workflows
Develop and improve distributed data processing solutions using Spark
Build and manage cloud-native infrastructure and deployment workflows
Partner with engineering teams to evolve platform architecture and data strategy
Implement engineering best practices around testing, monitoring, and observability
Explore and support emerging Generative AI initiatives across the platform
Contribute to the continued scaling of a rapidly growing data platform
The Offer
Competitive salary plus equity
You will receive the following benefits:
Medical Insurance
Vision Benefits
Paid Time Off (PTO)
Equity Participation
Flexible Hybrid Work Environment
Applicants must be currently authorized to work in the US on a full-time basis now and in the future.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
- Dice Id: 10105282
- Position Id: 878869
- Posted 6 hours ago