Scientific Data & Knowledge Engineer (Knowledge Graph Engineer)

• Posted 3 days ago • Updated 2 hours ago
Contract Corp To Corp
Contract W2
Contract Independent
Fitment

Dice Job Match Score™

📋 Comparing job requirements...

Job Details

Skills

  • Data Knowledge
  • Knowledge Graph
  • Scientific

Summary

Role: Scientific Data & Knowledge Engineer (Knowledge Graph Engineer)

Location: Remote

Role Overview

The Scientific Data & Knowledge Engineer is a specialist role at the intersection of data engineering, semantic technologies, and scientific domain knowledge. This individual is responsible for maximising the value of scientific data assets over their lifetime - acting as a translator between domain experts in R&D and the technical data systems that underpin research and discovery.

Working closely with Product Managers and R&D Subject Matter Experts, this role defines the language of science in data - through data models, ontologies, and controlled vocabularies - and ensures that scientific knowledge is structured, indexed, and interoperable across data products. The engineer serves as the voice of the Knowledgebase, championing the value and long-term usability of data assets.

Key Responsibilities

Metadata Harmonisation & Curation

  • Lead metadata harmonisation, curation, and large-scale dataset ingestion workflows.
  • Design and implement structured, auditable data transformations ensuring traceability and reproducibility.
  • Develop and maintain schema-driven automation pipelines (e.g., JSON Schema) to enforce data quality and consistency.

Ontology & Semantic Standards

  • Perform ontology alignment and entity normalisation using services such as the Ontology Lookup Service (OLS).
  • Develop and maintain vocabularies, ontologies (e.g., RAO), and controlled terminologies in collaboration with scientific SMEs.
  • Apply semantic web technologies including RDF/OWL triple stores, SHACL, and LinkML for knowledge representation.
  • Leverage knowledge graph and semantic query capabilities (e.g., Neo4j, GraphDB, SPARQL) where applicable.

Data Engineering & Pipeline Delivery

  • Engineer robust API and ETL pipelines for scientific data ingestion, transformation, and delivery (e.g., FastAPI, PostgreSQL).
  • Implement URI generation strategies and graph embedding machine learning pipelines.
  • Execute data engineering workloads on cloud infrastructure, primarily Google Cloud Platform (Google Cloud Platform), BigQuery, and GCS.
  • Adopt Infrastructure as Code (IaC) practices for scalable and repeatable platform deployment.

Collaboration & Knowledge Translation

  • Partner with Product Managers and R&D scientists to translate complex scientific concepts into robust, fit-for-purpose data models.
  • Act as the authoritative voice of the Knowledgebase - ensuring interoperability, reusability, and long-term value of data assets.
  • Contribute to and champion data governance standards, documentation, and best practices across the organisation.
  • Engage proactively with cross-functional stakeholders to align scientific terminology with technical data product requirements.

Technical Skills & Requirements

Languages & Query

Python

SPARQL

SQL

Scala

Semantic Technologies

RDF / Triple Stores

OWL

SHACL

LinkML

Ontologies (RAO)

Platforms & Infrastructure

Google Cloud Platform / BigQuery

Google Cloud Storage

Infrastructure as Code

ETL Processes

Tools & Technologies

GitHub / GitLab

Apache Jena

Protege

Jira / Confluence

FastAPI

Data Engineering Competencies

  • Metadata harmonisation and large-scale structured data ingestion.
  • URI generation and entity resolution at scale.
  • Graph embedding and machine learning pipeline integration.
  • Knowledge graph construction and semantic query optimisation.

Qualifications & Experience

Essential

  • Degree in Computer Science, Bioinformatics, Information Science, or a related scientific/technical discipline.
  • Demonstrable experience in data engineering with a focus on scientific or research data environments.
  • Hands-on expertise with semantic web technologies: RDF, OWL, SPARQL, ontology development and alignment.
  • Proficiency in Python and SQL; experience with Scala is advantageous.
  • Practical experience with cloud platforms, particularly Google Cloud Platform, BigQuery, and related data services.
  • Strong understanding of metadata standards, data harmonisation, and controlled vocabulary management.
  • Experience working with knowledge graph technologies (Neo4j, GraphDB, Apache Jena, or similar).

Desirable

  • Experience within a pharmaceutical, life sciences, or research-intensive organisation.
  • Familiarity with domain-specific ontologies such as RAO, ChEBI, or similar scientific vocabularies.
  • Working knowledge of LinkML and SHACL for schema definition and validation.
  • Exposure to MLOps practices and graph embedding or ML pipeline delivery.
  • Experience with Protege for ontology authoring and management.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 91135852
  • Position Id: 2026-134
  • Posted 3 days ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Irving, Texas

3d ago

Easy Apply

Contract

Depends on Experience

Hybrid in Orlando, Florida

5d ago

Easy Apply

Contract

Depends on Experience

Dearborn, Michigan

Today

Easy Apply

Contract

$61 - $66 /hr

Remote

3d ago

Easy Apply

Contract, Third Party

$55 - $64

Search all similar jobs