Data Scientist and AI/ML Engineer - Generative AI and Natural Language Processing (Hybrid - NJ or MA)

Overview

Remote
On Site
USD 104,200.00 - 163,900.00 per year
Full Time

Skills

Mergers and Acquisitions
Workflow
Research and Development
Large Language Models (LLMs)
Biomedicine
Embedded Systems
Testing
Natural Language
Environment Management
Docker
Pandas
NumPy
Data Analysis
Performance Metrics
Unsupervised Learning
Fluency
Cloud Computing
TensorFlow
Machine Learning Operations (ML Ops)
Benchmarking
Resource Description Framework
Ontologies
MongoDB
Relational Databases
PostgreSQL
Vector Databases
Web Applications
Django
Dash Python
Regular Expression
LangChain
Collaboration
Teamwork
Leadership
Mentorship
Semantics
Groupware
DevOps
Orchestration
GitHub
Apache Airflow
PyTorch
FOCUS
Electronic Health Record (EHR)
Natural Language Processing
Artificial Intelligence
Presentations
SAP BASIS
Innovation
Manufacturing
Research
Health Care
Insurance
Workday
Regulatory Compliance
LOS
Recruiting
Apache Spark
Bioinformatics
Business Intelligence
Communication
Computer Engineering
Computer Science
Database Design
Data Engineering
Data Modeling
Data Science
Data Visualization
Elasticsearch
Flask
Generative Artificial Intelligence (AI)
Git
Version Control
Machine Learning (ML)
Management
Python
matplotlib
Semantic Search
Software Development
Relationship Management
Waterfall
Performance Management
Preventive Maintenance
Project Management

Job Details

Job Description

The Data Scientist and AI/ML Engineer - Generative AI and Natural Language Processing role involves helping to develop and deploy production-grade NLP products for unstructured and semi-structured data from across our company's research and development pipeline. These models and workflows will help solve real-world problems and contribute to Artificial Intelligence and Machine Learning (AI/ML) in therapeutic research and development. Key focus areas will include the scalable deployment of ML and Generative AI approaches (such as Large Language Models, or LLMs) for surfacing insights from proprietary unstructured research data and biomedical literature, as well as the integration of structured information from the likes of knowledge graphs. The position is embedded in a cross-disciplinary team of data scientists, bioinformaticians, and engineers that are all focused on using cutting-edge software, AI/ML, and data science techniques to drive drug discovery and development.

You enjoy:

  • Building novel NLP/AI-enriched software that enables the discovery, development, and delivery of new therapeutics to patients in need

  • Understanding real-world challenges and developing automated data solutions for them

  • Opportunities to directly interact with users and stakeholders of your data science, ML, and AI products

  • Evaluating, developing, testing, and deploying new techniques for natural language understanding and new DevOps and ML/LLMOps frameworks.

  • Freedom to propose projects that interest you and to collaborate cross-functionally on delivery

  • Staying updated on the newest methods in NLP, ML, generative AI, and ML/LLMOps

  • Sharing the approaches you implement and their impact with internal company audiences and externally

You have:

The following are preferred skills and experience, not strict requirements

  • Fluency in Python programming, version control and collaboration with git, environment management (e.g., poetry, conda, docker), standard Python packages for data exploration (e.g., pandas, numpy, matplotlib)

  • Fluency with data science and NLP approaches such as exploratory data analysis, performance metrics and benchmarks, supervised and unsupervised learning, transformers, and LLMs.

  • Fluency with standard cloud and DevOps tools, such as Infrastructure as Code (IaC) and Github Actions.

  • Experience with at least one ML framework (e.g., pytorch, tensorflow, fairseq) and with ML model deployment and operations (MLOps/LLMOps)

  • Experience with scalable data engineering frameworks such as Apache Spark and orchestration frameworks such as Airflow, semantic search and retrieval frameworks (e.g., development and benchmarking of embedding models and retrieval approaches in the context of Retrieval Augmented Generation, RAG), and/or semantic knowledge frameworks (e.g. RDF triplestores, property graphs, ontology management).

  • Experience with standard operations on non-relational (e.g., Elasticsearch/Opensearch, MongoDB, Neptune), relational databases (e.g., PostgreSQL), and vector databases (e.g., pgvector, Elasticsearch dense vectors) and deployment of APIs and web applications (e.g., flask, fastAPI, django, or dash)

  • Working knowledge of NLP and/or Generative AI libraries (e.g., regular expressions, spacy, langchain) and text/document annotation tools (e.g., Prodigy, BRAT)

  • A demonstrated ability to engage cross-functional teams and stakeholders, including an eagerness to acquire a level of domain knowledge

  • Excellent communication, teamwork, didactic, and leadership skills, including skills for scientific communication (authoring scientific articles and presenting) and guidance and mentorship of junior employees and less experienced collaborators

Minimum Requirements:

  • High School Diploma required.
  • B.S. with focus on Computer Science, Computer Engineering, Semantic Engineering, NLP, data science, AI/ML/LLM engineering, or a related discipline preferred.
  • Minimum of 2 years of industry, internship/co-op experiences.
  • Minimum of 1 year of industry experience with Python programming, version control and collaborative software development with git, DevOps and orchestration tools including Github Actions and Apache Airflow, and at least one AI/ML framework such as Pytorch.

MSJR

#eligibleforerp

Additional job details:

The types of datasets we focus on are both internal (e.g., electronic lab notebooks, safety reports, regulatory documents, clinical results) and external (e.g., public literature and Electronic Medical Records). In addition to new tool development, we often consult with some of our 5,000+ stakeholders (scientists, engineers, regulatory liaisons, data scientists, etc.) on their own projects, as well as additional stakeholders from across our multi-national company. We strive to enhance data science, NLP, and AI literacy across these groups. As part of our work, we have opportunities to coauthor presentations, reports, manuscripts, and/or public code releases.

Current Employees apply HERE

Current Contingent Workers apply HERE

US and Puerto Rico Residents Only:

Our company is committed to inclusion, ensuring that candidates can engage in a hiring process that exhibits their true capabilities. Please click here if you need an accommodation during the application or hiring process.

As an Equal Employment Opportunity Employer, we provide equal opportunities to all employees and applicants for employment and prohibit discrimination on the basis of race, color, age, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, disability status, or other applicable legally protected characteristics. As a federal contractor, we comply with all affirmative action requirements for protected veterans and individuals with disabilities. For more information about personal rights under the U.S. Equal Opportunity Employment laws, visit:

EEOC Know Your Rights

EEOC GINA Supplement

We are proud to be a company that embraces the value of bringing together, talented, and committed people with diverse experiences, perspectives, skills and backgrounds. The fastest way to breakthrough innovation is when people with diverse ideas, broad experiences, backgrounds, and skills come together in an inclusive environment. We encourage our colleagues to respectfully challenge one another's thinking and approach problems collectively.

Learn more about your rights, including under California, Colorado and other US State Acts

U.S. Hybrid Work Model

Effective September 5, 2023, employees in office-based positions in the U.S. will be working a Hybrid work consisting of three total days on-site per week, Monday - Thursday, although the specific days may vary by site or organization, with Friday designated as a remote-working day, unless business critical tasks require an on-site presence.This Hybrid work model does not apply to, and daily in-person attendance is required for, field-based positions; facility-based, manufacturing-based, or research-based positions where the work to be performed is located at a Company site; positions covered by a collective-bargaining agreement (unless the agreement provides for hybrid work); or any other position for which the Company has determined the job requirements cannot be reasonably met working remotely. Please note, this Hybrid work model guidance also does not apply to roles that have been designated as "remote".

The salary range for this role is
$104,200.00 - $163,900.00

This is the lowest to highest salary we in good faith believe we would pay for this role at the time of this posting. An employee's position within the salary range will be based on several factors including, but not limited to relevant education, qualifications, certifications, experience, skills, geographic location, government requirements, and business or organizational needs.

The successful candidate will be eligible for annual bonus and long-term incentive, if applicable.

We offer a comprehensive package of benefits. Available benefits include medical, dental, vision healthcare and other insurance benefits (for employee and family), retirement benefits, including 401(k), paid holidays, vacation, and compassionate and sick days. More information about benefits is available at ;br>
You can apply for this role through (or via the Workday Jobs Hub if you are a current employee). The application deadline for this position is stated on this posting.

San Francisco Residents Only: We will consider qualified applicants with arrest and conviction records for employment in compliance with the San Francisco Fair Chance Ordinance

Los Angeles Residents Only: We will consider for employment all qualified applicants, including those with criminal histories, in a manner consistent with the requirements of applicable state and local laws, including the City of Los Angeles' Fair Chance Initiative for Hiring Ordinance

Search Firm Representatives Please Read Carefully
Merck & Co., Inc., Rahway, NJ, USA, also known as Merck Sharp & Dohme LLC, Rahway, NJ, USA, does not accept unsolicited assistance from search firms for employment opportunities. All CVs / resumes submitted by search firms to any employee at our company without a valid written search agreement in place for this position will be deemed the sole property of our company. No fee will be paid in the event a candidate is hired by our company as a result of an agency referral where no pre-existing agreement is in place. Where agency agreements are in place, introductions are position specific. Please, no phone calls or emails.

Employee Status:
Regular

Relocation:
Domestic

VISA Sponsorship:
No

Travel Requirements:
10%

Flexible Work Arrangements:
Hybrid

Shift:
Not Indicated

Valid Driving License:
No

Hazardous Material(s):
N/A

Required Skills:
Apache Spark, Applied Engineering, Bioinformatics, Business Intelligence (BI), Communication, Computer Engineering, Computer Science, Database Design, Data Engineering, Data Modeling, Data Science, Data Visualization, Drug Discovery Process, ElasticSearch, Flask (Web Framework), Generative AI, Git Version Control System, Machine Learning, Management Process, Python Matplotlib, Semantic Search, Software Development, Stakeholder Relationship Management, Therapeutics, Waterfall Model

Preferred Skills:

Job Posting End Date:
07/6/2025
*A job posting is effective until 11:59:59PM on the day BEFORE the listed job posting end date. Please ensure you apply to a job posting no later than the day BEFORE the job posting end date.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.