AI Data Engineer - Scientific Data Platforms (Remote)

Remote in South San Francisco, CA, US • Posted 9 hours ago • Updated 9 hours ago
Full Time
On-site
USD $35.00 - 38.00 per hour
Fitment

Dice Job Match Score™

📋 Comparing job requirements...

Job Details

Skills

  • Biotechnology
  • Pharmaceutics
  • Science
  • Health Care
  • Meta-data Management
  • Clinical Trials
  • Training
  • PASS
  • Artificial Intelligence
  • Software Engineering
  • Biology
  • LlamaIndex
  • Python
  • Data Manipulation
  • Database
  • Bioinformatics
  • RNA
  • Genomics
  • Proteomics
  • Writing
  • Data Processing
  • Microsoft Certified Professional
  • Cloud Computing
  • Amazon Web Services
  • Google Cloud
  • Google Cloud Platform
  • Docker
  • Workflow

Summary

Pay Rate Low: 35 | Pay Rate High: 40
Our client is a leading global biotechnology and pharmaceutical organization driven by a mission to innovate, continuously advance science, and ensure everyone has access to the healthcare they need.

Title: AI Data Engineer - Scientific Data Platforms
Location: Remote, Must work PST
Pay rate: $35-38/hr (Depends on experience level)
Schedule: Full-time (40 hours/week)
Duration: 1-year contract, (Plus benefits)

Position Overview
This role addresses a critical need in scaling our AI models for drug discovery by building largely automated, scalable, agent-driven data ingestion and curation pipelines for genomics data. This includes metadata inference, constructing performant query architectures, and transforming high-dimensional datasets (e.g., single-cell omics, clinical trials) into AI-ready training formats.
Key Responsibilities
  • Build an agentic data ingestion pipeline and move beyond bespoke steps toward agents that teams can reliably use as a shared, deployed service.
  • Triage and prioritize incoming requests to ingest specific datasets. Clean and organize data, building the first-pass cleaning and organization steps into the agentic flow.
  • Validate cross-modal linkage. Add automated checks that catch when ingested data does not connect correctly and flag low-quality or mismatched records.
  • Version every dataset, retaining and making prior versions addressable. Preserve raw data and provenance, ensuring agent workflows log validation and transformation steps so lineage is fully traceable.
  • Partner with AI, software engineering, and computational biology groups to co-define data standards and conventions.

Qualifications & Requirements
  • Demonstrated experience building multi-agent workflows or LLM workflows using tools/frameworks such as LangGraph or LlamaIndex, including tool/function calling and asynchronous task execution.
  • Strong Python skills for data manipulation, working with APIs and databases, and handling heterogeneous data formats.
  • Familiarity with dataset versioning approaches (e.g., DVC, lakeFS, or equivalent).
  • Comfortable with or showing a strong willingness to learn common omics data formats like AnnData, H5AD, and TileDB.
  • No deep bioinformatics expertise required; just a basic conceptual understanding of different modalities (e.g., RNA-seq vs. scRNA-seq vs. WES; genomics vs. transcriptomics vs. proteomics vs. metabolomics).
  • Comfortable writing unit and functional tests to ensure data processing workflows are reliable and reproducible.
  • Degree in a technical field or equivalent practical experience.
  • Must be Authorized to work in the United States without Sponsorship.
Nice to Have
  • Experience deploying agent workflows as a shared service (e.g., FastAPI or MCP endpoints).
  • Exposure to cloud platforms (AWS, Google Cloud Platform) and containerization (Docker).
  • Familiarity with scientific workflow managers such as Nextflow or Snakemake.

INDBH
#LI-MG1
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10522794
  • Position Id: c3416b9d6c9d6a4aed62ebaa7d9c34de
  • Posted 9 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

San Francisco, California

9d ago

Full-time

USD 181,500.00 - 283,800.00 per year

San Francisco, California

Today

Full-time

USD 152,000.00 - 240,000.00 per year

San Francisco, California

7d ago

Full-time

USD 152,000.00 - 240,000.00 per year

San Francisco, California

Today

Full-time

USD 175,000.00 - 330,000.00 per year

Search all similar jobs