Senior HPC Architect
Location: Warren, NJ (Hybrid)
About the Role
We are seeking a highly specialised Senior Google Cloud Platform Architect with deep, hands-on High Performance Computing (HPC) expertise in Life Sciences and Genomics to lead the migration of large-scale on-premises HPC environments to Google Cloud Platform. This is a hands-on technical leadership role - not an advisory position - requiring genuine experience migrating production HPC infrastructure and scientific computing workloads from on-premises clusters to Google Cloud Platform.
The ideal candidate has personally designed and executed on-premises HPC to Google Cloud Platform migration programmes in life sciences or genomics settings, understands the unique regulatory, data sensitivity, and performance demands of genomic pipelines and scientific workloads, and can architect Google Cloud Platform HPC environments that match or exceed the performance of the on-premises systems being replaced.
You will work directly with research computing teams, bioinformatics leads, IT infrastructure staff, and senior client stakeholders - translating complex scientific computing requirements into well-architected, compliant, and cost-optimised Google Cloud Platform solutions. This role requires someone equally comfortable debugging a GATK pipeline performance issue with a bioinformatician and presenting a cloud HPC migration business case to a CIO.
Key Responsibilities
HPC Migration - Discovery & Planning
- Lead end-to-end on-premises HPC migration discovery for life sciences and genomics environments - including full inventory of existing clusters, scheduler configurations, storage systems, application stacks, and data assets.
- Conduct structured discovery workshops with research computing teams, bioinformatics leads, laboratory IT staff, and HPC administrators to document current-state architecture, workload profiles, job scheduling patterns, and pain points.
- Perform detailed workload characterisation - profiling genomic pipeline jobs (WGS, WES, RNA-seq, single-cell, variant calling) across compute, memory, storage I/O, and runtime dimensions to inform Google Cloud Platform sizing and architecture decisions.
- Build comprehensive application and dependency maps - cataloguing HPC software stacks (bioinformatics tools, pipeline frameworks, commercial ISV applications), license dependencies, data dependencies, and inter-workload relationships.
- Develop HPC Migration Readiness Assessments (MRA) - evaluating gaps in network connectivity, data transfer capacity, security and compliance posture, team cloud readiness, and pipeline portability before migration begins.
- Define migration wave plans sequencing workload migration based on complexity, scientific criticality, data volumes, regulatory sensitivity, and dependency chains - enabling a phased, low-risk transition.
- Build detailed migration business cases including on-premises TCO analysis, Google Cloud Platform cost modelling, performance benchmarks, and phased investment roadmaps for sign-off by research and IT leadership.
Google Cloud Platform HPC Architecture for Life Sciences
- Architect end-to-end Google Cloud Platform HPC environments optimised for genomics and life sciences workloads, leveraging Google Cloud's HPC-specific compute, networking, storage, and managed services.
- Select and right-size compute instance families for life sciences HPC workloads:
- C3 / N2 instances for CPU-intensive bioinformatics tools (BWA, GATK, STAR, Salmon)
- M3 / M2 memory-optimised instances for large in-memory genomics jobs
- A3 / A2 GPU instances for deep learning genomics workloads (AlphaFold, Parabricks, deep variant calling)
- Spot VMs for fault-tolerant, checkpointed pipeline jobs to optimise cost
- Design low-latency cluster networking using compact placement policies, Google's RDMA-capable networking, and GPUDirect RDMA for tightly coupled parallel workloads.
- Architect high-performance parallel storage solutions for genomics data:
- Google Parallelstore (Intel DAOS-based) for high-throughput scratch and active analysis data
- Filestore High Scale / Enterprise for shared pipeline working directories
- Cloud Storage with FUSE or XML API for reference genomes, raw sequencing data (FASTQ/BAM/CRAM), and results archival
- Storage tiering strategy - active nearline coldline archive - aligned with data lifecycle and access patterns
- Design Slurm-based HPC cluster architectures on Google Cloud Platform using Google Cloud HPC Toolkit, including:
- Auto-scaling partition configuration for variable genomics job demand
- Reservation management for predictable sustained workloads
- Preemptible/Spot VM integration for cost-optimised burst capacity
- Multi-partition designs separating short-job, long-job, GPU, and high-memory workloads
- Implement Google Batch for high-throughput genomics pipelines requiring parallel task execution across thousands of samples.
Experience
- Overall experience in cloud infrastructure, HPC, or research computing.
- hands-on Google Cloud Platform architecture with deep command of Google Cloud Platform compute, networking, storage, and life sciences services.
- hands-on HPC experience in life sciences or genomics - designing, operating, and migrating HPC clusters supporting bioinformatics or research computing workloads. Production experience required.
- Demonstrated experience leading on-premises HPC to Google Cloud Platform migration programmes - from discovery through cutover and decommission - at research or enterprise scale.
- Hands-on experience with genomics pipeline frameworks - WDL/Cromwell, Nextflow, Snakemake, or equivalent - and their deployment on Google Cloud Platform execution backends.
- Hands-on experience with Google Cloud HPC Toolkit for Slurm cluster deployment, configuration, and customisation on Google Cloud Platform.
- Proven experience with large-scale genomics data migration - petabyte-scale FASTQ/BAM/VCF datasets - using Storage Transfer Service, Transfer Appliance, or equivalent.
- Experience with Slurm workload manager - cluster configuration, partition design, job scheduling, and migration from PBS/Torque/LSF environments.
- Strong command of Google Cloud Platform storage architecture for HPC - Parallelstore, Filestore, Cloud Storage, and storage tiering for genomics data lifecycle management.
- Experience with HIPAA-compliant cloud architecture for genomics or clinical data environments.
- Strong Linux systems administration at the HPC level - kernel tuning, environment modules, cluster OS image management, and MPI environment configuration.