Apply Now

Senior Data Architect, Integrated Data Platform

San Francisco, CA, US • Posted 3 days ago • Updated 3 days ago

Contract W2

Contract Independent

12 Months

On-site

Depends on Experience

Fitment

Dice Job Match Score™

🔗 Matching skills to job...

Job Details

Skills

IDP
DICOM
PostgreSQL
Apache Iceberg
GUPRI
FAIRification
GxP
ALCOA
21 CFR Part 11

Summary

Key Responsibilities

Data Modeling and Architecture

Lead the design of the Integrated Data Package (IDP) data model, covering multi-modal study assets including DICOM imaging, omics, and real-world data sources

Define the two-layer data architecture: operational relational layer for study metadata, cataloging, and access registry; lakehouse layer for versioned study assets at scale

Design schemas, partitioning strategies, and table formats across relational (PostgreSQL) and open table format (Apache Iceberg) layers to support both transactional and analytical access patterns

Establish cross-modal patient and study linkage standards, including integration with the Global Unique Patient Record Identifier (GUPRI) and related master data entities

Define data versioning and snapshot strategies for study-level packages, enabling reproducible dataset construction for algorithm development and regulatory submissions

Lakehouse and Query Layer

Architect the Apache Iceberg-based lakehouse layer on S3, including table design, schema evolution governance, compaction policies, and metadata management

Design the version catalog architecture using Project Nessie or equivalent catalog tooling, covering namespace structure, branching strategy, and atomic snapshot tagging

Define query access patterns and optimization strategies across the lakehouse layer using distributed SQL query engines

Govern the data access API surface exposed to downstream consumers including the algorithm development workbench and reporting services

FAIRification and Data Governance

Design proactive FAIRification pipelines that enrich incoming study data with standardized metadata, controlled vocabularies, and linkage keys at ingestion time

Define data quality validation rules, error handling workflows, and observability hooks across the ingestion and enrichment pipeline

Establish data lineage and provenance tracking across the full data lifecycle from ingestion through version snapshot to analytical consumption

Ensure data architecture supports GxP audit trail requirements including ALCOA+ principles for traceability, integrity, and contemporaneity

Stakeholder Collaboration and Governance

Serve as the primary data architecture authority for the program, partnering with imaging platform, workbench, and regulatory workstreams on cross-cutting data decisions

Engage directly with client data, engineering, and architecture stakeholders to align on data models, access patterns, and governance standards

Produce and maintain architecture artifacts including data models, schema documentation, ADRs, and data dictionary

Contribute to milestone delivery planning, technical risk management, and program-level architecture reviews

Required Qualifications

10+ years of experience in data architecture, data engineering, or enterprise data platform design

Expert-level proficiency in relational data modeling (PostgreSQL or equivalent), including schema design, normalization, JSONB/semi-structured patterns, and query optimization

Hands-on experience designing and operating modern lakehouse architectures using Apache Iceberg or equivalent open table formats (Delta Lake, Apache Hudi)

Strong background in distributed query engines (Presto, Trino, Spark SQL, or equivalent) and large-scale data partitioning strategies

Experience with data versioning concepts including snapshot isolation, time travel, schema evolution, and catalog management

Demonstrated experience delivering data platforms in regulated environments with GxP, 21 CFR Part 11, or equivalent compliance requirements

Strong written and verbal communication skills, with the ability to document data models and architecture decisions for mixed technical and regulatory audiences

Nice to Have

Hands-on experience with Project Nessie or equivalent transactional catalog tooling for Iceberg

Background in medical imaging data (DICOM) or multi-modal clinical data integration including omics or real-world data

Familiarity with FAIR data principles and their application to life sciences data platforms

Experience with workflow orchestration tools (Apache Airflow, Temporal, or equivalent) in the context of data pipeline design

Prior experience in a fixed-fee, milestone-based delivery engagement within a large regulated enterprise environment

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: infnj003
Position Id: 8989786
Posted 3 days ago

Contact the job poster

Vijay Kumar

Recruiter @ Infinity Tech Group Inc

View Profile

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

San Francisco, California

•

2d ago

Senior Data Architect, Integrated Data Platform Location: San Francisco Bay Area, CA Contract position Role Overview: Client is seeking a Senior Data Architect to lead the data modeling and platform design for a next-generation Integrated Data Platform (IDP) supporting a regulated medical imaging program at a global pharmaceutical and diagnostics company. This role is responsible for defining the data architecture across relational and lakehouse layers, governing the structure of versioned stud

Easy Apply

Contract, Third Party

Depends on Experience

Placement Roles - Senior Architect, Clinical Imaging Platform/ Senior Architect, Algorithm Development Workbench/ Senior Data Architect, Integrated Data Platform- Hybrid- CA

Hybrid in San Francisco, California

•

3d ago

Rek 1 Role: Senior Architect, Clinical Imaging Platform (Hybrid 2-3 Days Onsite/ Week) Location: San Francisco Bay Area, CA Duration: 18 - 24+ Months Job Summary: Role Overview Seeking a Senior Architect to lead the technical design and delivery of a modernized Clinical Imaging Platform for a major regulated life sciences engagement. This role is responsible for defining the end-to-end architecture of a cloud-native imaging infrastructure spanning DICOM ingestion, storage, retrieval, and platfo

Easy Apply

Contract

Depends on Experience

Data Architect

San Francisco, California

•

30+d ago

Job Title: Data Architect Location: San Francisco, California, USA (onsite) Duration: 12+Months 12+ years of experience Required Skills: 12+ years in data architecture, analytics engineering, or data platform rolesDeep expertise in Google BigQuery data modeling, optimization, and governanceProficiency in dbt (data build tool) project setup, testing, and documentationExperience with Python ETL and SQL pipelinesFamiliarity with BI tools in the modern data stack (Periscope, Looker, Metabase, Table

Easy Apply

Third Party, Contract

Depends on Experience

Senior Tech Lead - Clinical Data Engineering/ Data Platforms

Alameda, California

•

5d ago

We are seeking a Senior Technical Leader to provide hands-on technical leadership for clinical data engineering, data platform development, and analytics delivery. This role will lead the design and execution of scalable data pipelines, lakehouse architecture, cloud services integration, and delivery governance across active clinical data initiatives. The ideal candidate will have strong experience with Databricks Lakehouse, Delta Lake, Unity Catalog, Medallion architecture, SQL Warehouses, AWS

Easy Apply

Third Party, Contract

70 - 90

Search all similar jobs

Senior Data Architect, Integrated Data Platform

Dice Job Match Score™

Job Details

Skills

Summary

Vijay Kumar

Similar Jobs