Lead Research Software Engineer

Oak Ridge, TN, US • Posted 4 hours ago • Updated 4 hours ago
Full Time
On-site
Fitment

Dice Job Match Score™

🫥 Flibbertigibetting...

Job Details

Skills

  • Computational Science
  • Leadership
  • Orchestration
  • High Performance Computing
  • HPC
  • Research
  • Data Processing
  • FOCUS
  • Innovation
  • Scalability
  • Artificial Intelligence
  • Workflow
  • Application Development
  • Streaming
  • Apache Kafka
  • Amazon S3
  • Web Portals
  • Database
  • Storage
  • Machine Learning (ML)
  • Lifecycle Management
  • Collaboration
  • Software Design
  • Cyber Security
  • Account Management
  • Regulatory Compliance
  • Access Control
  • Usability
  • Data Management
  • API
  • Continuous Integration
  • Continuous Delivery
  • Kubernetes
  • Code Review
  • Git
  • GitHub
  • GitLab
  • Machine Learning Operations (ML Ops)
  • Data Engineering
  • PHP
  • Python
  • JavaScript Frameworks
  • React.js
  • AngularJS
  • Node.js
  • Test-driven Development
  • Agile
  • Software Development
  • Open Source
  • Adobe
  • PDF
  • RTF
  • HTML

Summary

Requisition Id 16230

Overview:

The National Center for Computational Sciences (NCCS) at Oak Ridge National Lab (ORNL), which hosts several of the world's most powerful computer systems, is seeking highly qualified individuals to play a key role in designing, developing, and deploying data management tools and persistent services that support scientific and AI/ML campaigns that run on NCCS computing infrastructure, including the world's first exaflop system, Frontier.

The Team:

As a Lead Research Software Engineer (RSE) in the Data and Platform Services (DAPS) group, you will work within the HPC Operations Section. The DAPS group designs and operates data management platforms, tools, and services for the end-to-end data lifecycle from ingestion to publication and supports several large initiatives and facilities at ORNL. Our primary development and deployment platform is the Oak Ridge Leadership Computing Facility (OLCF) Slate Service, built on Kubernetes and Rancher, which provides a container orchestration service for running critical operation applications and user-managed persistent applications that run alongside our OLCF supercomputer systems and other OLCF managed HPC clusters.

The Role:

As a Lead Research Software Engineer, you will design, implement, operate, and maintain federated data platforms, data management portals, data processing pipelines, API gateways, and persistent services for the entire data lifecycle on our on-premises Kubernetes clusters, with a strong focus on innovation, scalability, reliability, and maintainability. You will also assist with AI initiatives at OLCF, evaluate and integrate key data engineering and MLOps technologies, be an individual contributor for medium sized projects, and collaborate with Platform engineers in delivering a robust set of production services for OLCF users. This role requires: significant experience with data management platforms and tools, significant experience with full stack application and API development, working knowledge of Kubernetes and containerization. Knowledge of current AI/ML tools and workflows is preferred but not required.

Major Duties/Responsibilities:

Application Development and Deployment
  • Identify and evaluate solutions for federated data management (e.g., Pelican, Rucio), data catalog solutions (e.g. CKAN, DKAN, Schema.org), streaming data (e.g., Kafka), and data movement (XRootD, Globus, S3).
  • Design and implement web portals and API services for data management using a combination of modern web technologies.
  • Develop, implement, and maintain Kubernetes deployment recipes for data portals, catalogs, API gateways, and other ancillary services like key-value stores and databases.
  • Design and develop solutions for MLOps including model lifecycle management and storage, as well as integration with existing platforms like MLFlow.
  • Stay up to date on both open source and commercial platforms and tools being developed for end-to-end data and ML lifecycle management.

Collaboration
  • Assist the Group Leader with developing platform and software design documents for new projects and lead implementation efforts with other Software Engineers in the group.
  • Partner closely with internal platforms, cybersecurity, and account management teams to ensure the platform meets security, compliance, role-based access controls, and usability expectations.
  • Participate in cross-functional projects related to platform enhancements and cluster lifecycle automation.
  • Be able to represent the DAPS team with internal collaborators and partners across the lab.

Basic Qualifications:
  • BS degree and 5+ years of relevant experience or equivalent experience.
  • At least three years of experience with data management platform and tools development.
  • At least three years of experience with full stack application and API development.
  • Experience with CI/CD tooling, GitOps, and Kubernetes.
  • Experience with code review and familiarity with tools like git, GitHub and GitLab.

Preferred Qualifications:
  • M.S. or Ph.D. in a technical field.
  • Excellent interpersonal/communications skills, and the ability to work as part of a team.
  • Experience with modern MLOps, data engineering, and LLM technologies.
  • Experience with PHP, Python, modern Javascript frameworks (React, AngularJS, NodeJS).
  • Experience designing and implementing highly available systems/services.
  • 8+ years of experience in addition to the degree.
  • Experience with modern software practices such as test-driven development, Agile software development practices and a firm, proven knowledge of software development lifecycles.
  • Demonstrated activity within the broader open-source software community.

This position will remain open for a minimum of 5 days after which it will close when a qualified candidate is identified and/or hired.

We accept Word (.doc, .docx), Adobe (unsecured .pdf), Rich Text Format (.rtf), and HTML (.htm, .html) up to 5MB in size. Resumes from third party vendors will not be accepted; these resumes will be deleted and the candidates submitted will not be considered for employment.

If you have trouble applying for a position, please email

ORNL is an equal opportunity employer. All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply. UT-Battelle is an E-Verify employer.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 80180502
  • Position Id: a398a9f0c31c89412e900778d549c10b
  • Posted 4 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Oak Ridge, Tennessee

Today

Full-time

Oak Ridge, Tennessee

Today

Full-time

Oak Ridge, Tennessee

Today

Full-time

USD 115,000.00 - 150,000.00 per year

Oak Ridge, Tennessee

Today

Full-time

USD 72.00 - 85.00 per hour

Search all similar jobs