Data Science & Machine Learning
Location: ~Indianapolis, IN~
Duration: 6 months
Key Responsibilities
• Design, develop, and deploy machine learning models using Databricks (MLflow, Spark ML, Python) for pharma and life sciences use cases
• Implement end-to-end ML pipelines covering data ingestion, feature engineering, model training, deployment, and monitoring
• Build predictive models for patient identification, HCP segmentation, market access analytics, pharmacovigilance, and safety signal detection
• Apply NLP and generative AI techniques (LLMs, RAG pipelines) to extract insights from medical literature, clinical notes, and regulatory documents
• Conduct A/B testing, model validation, and statistical analysis to evaluate model performance and business impact
• Collaborate with data engineers to ensure reliable, high-quality, production-ready datasets in the Lakehouse
Databricks & Lakehouse Architecture
• Leverage Databricks Lakehouse (Delta Lake, Unity Catalog) for scalable, governed, and high-performance analytics
• Design and optimize Spark jobs for performance and cost efficiency across large-scale pharma datasets
• Apply best practices for data governance, data lineage, and security within Unity Catalog
• Build and maintain Bronze / Silver / Gold Medallion architecture for clinical, claims, and commercial data
• Implement Delta Live Tables (DLT) pipelines with data quality checks for real-time and batch processing
• Configure and manage Databricks Workflows, Repos, and cluster policies for production ML workloads
Genie (AI/BI & Natural Language Analytics)
• Configure and enable Databricks Genie for self-service analytics across business and scientific teams
• Design semantic layers and curated Gold datasets optimized for natural language queries via Genie
• Define certified questions, trusted assets, and business glossary terms to improve Genie response quality
• Partner with business stakeholders to translate complex pharma questions into Genie-enabled insights
• Monitor and iterate on Genie Spaces based on user feedback, query accuracy, and adoption metrics
• Enable non-technical users across Medical Affairs, Commercial, and R&D to self-serve data insights
Real-World & Clinical Data Analysis
• Analyze real-world data (RWD), electronic health records (EHR), claims data, and clinical trial datasets to generate actionable insights
• Build scalable data pipelines for pharma-specific sources including IQVIA, Symphony Health, Komodo, and specialty pharmacy data
• Apply survival analysis, mixed models, and Bayesian methods for epidemiology and health economics (HEOR) studies
• Ensure all models and data processes comply with HIPAA, GxP, and 21 CFR Part 11 regulations
Business Enablement & Stakeholder Collaboration
• Work closely with product owners, analysts, and business leaders to identify and prioritize high-value data science use cases
• Communicate complex analytical results and model outputs in a clear, business-friendly manner to non-technical audiences
• Produce analytical documentation: model cards, design specs, performance reports, and executive summaries
• Lead sprint ceremonies as analytics owner: architecture reviews, estimation sessions, and release planning
Required Qualifications
• Experience: 4+ years of professional experience in data science or advanced analytics, preferably in pharma, biotech, or life sciences
• Education: Bachelor''s or Master''s degree in Data Science, Computer Science, Statistics, Engineering, or a related field
• Databricks: Hands-on experience with Databricks and Apache Spark for large-scale data processing and ML workloads
• Python: Strong programming skills in Python — PySpark, Pandas, NumPy, Scikit-learn — for data science and ML development
• MLflow: Experience building and deploying ML models in production using MLflow for experiment tracking and model lifecycle management
• SQL: Solid understanding of SQL and data modeling for analytical and reporting workloads on large datasets
• Delta Lake: Experience with Delta Lake, Unity Catalog, and Medallion architecture (Bronze / Silver / Gold) for Lakehouse analytics
• Genie / AI-BI: Familiarity with Databricks Genie or AI/BI tools for natural language querying and self-service analytics
• Healthcare Data: Experience working with clinical, claims, or real-world healthcare data (EHR, RWD, specialty pharmacy)
• Compliance: Familiarity with HIPAA compliance and handling of sensitive patient data in regulated environments
• Communication: Strong communication skills — ability to translate complex models and analysis into clear, actionable business insights