Job Title: Hydra Developer (Python | Pharma / Life Sciences)
Location: remote or onsite in Bloomfield, CT, or Arlington, MA
Overview
We are seeking an experienced Hydra developer to design, build, and scale configuration-driven applications supporting advanced analytics, machine learning, and scientific workflows in a regulated pharmaceutical environment.
This role will focus on leveraging Hydra to manage complex, parameterized workflows, enabling reproducibility, scalability, and rapid experimentation across R&D, clinical, and data science use cases.
Hydra is an open-source Python framework used to configure complex applications through hierarchical, composable configurations and command-line overrides, making it ideal for research and ML-heavy environments.
---
Key Responsibilities
Hydra / Python Development
· Design and implement configuration-driven Python applications using Hydra and OmegaConf
· Build modular, composable configuration structures (YAML-based) to support:
o ML experimentation pipelines
o Data processing workflows
o Simulation and modeling environments
· Enable multi-run experimentation (parameter sweeps, scenario testing) using Hydra’s job launching capabilities
· Develop reusable components leveraging Hydra’s ability to instantiate objects dynamically from configuration
· Integrate Hydra with:
o ML frameworks (PyTorch, TensorFlow, Scikit-learn)
o Experiment tracking tools (e.g., W&B, MLflow)
o Cloud / HPC execution environments
---
Pharma / Life Sciences Application Development
· Build and support solutions across:
o Clinical data pipelines (e.g., SDTM/ADaM processing)
o Regulatory submission workflows
o Pharmacovigilance / safety analytics
o R&D data science platforms
· Ensure solutions meet GxP, 21 CFR Part 11, and validation requirements
· Support inspection-ready workflows with reproducibility and audit trails
· Collaborate with:
o Clinical Data Scientists
o Biostatisticians
o Regulatory Affairs teams
o Digital / R&D IT
---
Architecture & Engineering
· Architect systems that separate code and configuration, improving maintainability and reproducibility
· Implement CI/CD pipelines for ML and data workflows
· Optimize performance for large-scale data and compute workloads
· Ensure robust logging, monitoring, and traceability (leveraging Hydra’s built-in logging capabilities)
· Contribute to platform standardization for experiment configuration and execution
---
Required Qualifications
Technical Skills
· Strong Python development experience (5+ years preferred)
· Hands-on experience with Hydra and OmegaConf
· Experience building config-driven or parameterized applications
· Familiarity with:
o YAML-based configuration systems
o Object instantiation patterns (dependency injection via config)
· Experience with ML/data tools:
o PyTorch / TensorFlow / Scikit-learn
o Pandas / NumPy
· Knowledge of:
o REST APIs, ETL pipelines, and data orchestration
o Git, CI/CD, containerization (Docker/Kubernetes)
---
Pharma / Domain Experience
· 3+ years in pharmaceutical, biotech, or CRO environments
· Experience with at least one:
o Clinical data systems (e.g., Medidata, Veeva, SAS)
o Regulatory systems (e.g., RIM, submissions workflows)
o Safety systems (e.g., Argus)
· Understanding of:
o GxP validation
o Data integrity principles (ALCOA+)
o Audit/inspection readiness
---
Preferred Qualifications
· Experience using Hydra for:
o ML experimentation platforms
o Hyperparameter sweeps / distributed runs
· Familiarity with:
o Experiment tracking tools (MLflow, Weights & Biases)
o Workflow orchestration (Airflow, Prefect)
· Experience in AI/ML applied to drug discovery, clinical trials, or real-world evidence
· Exposure to cloud platforms (AWS, Azure, Google Cloud Platform)
---
Key Competencies
· Strong problem-solving in complex, parameterized systems
· Ability to bridge data science and engineering
· Experience working in regulated environments
· Strong communication with both technical and scientific stakeholders
---
Example Use Cases This Role Will Support
· Configurable clinical data ingestion pipelines across studies
· Reproducible statistical analysis workflows
· Scalable AI/ML experimentation frameworks for drug discovery
· Automated regulatory reporting pipelines