Software Development (Release Engineer)

Overview

Remote
Depends on Experience
Contract - W2
Contract - 12 Month(s)
No Travel Required
Unable to Provide Sponsorship

Skills

Release
CI/CD
Machine Learning/ ML workloads

Job Details

Software Development Engineer, Release
Open to candidates in San Jose, CA or Fully Remote is OK
12 months contract

Must Have Skills:
• Release engineering & CI/CD at scale
• Containerization & reproducible builds (Expert Docker workflows (multi-stage builds, caching, multi-arch)
• Build & test automation for distributed ML workloads
• Strong debugging + scripting

THE ROLE:

We are seeking a skilled and motivated Software Development Engineer to join our Training at Scale team. In this role, you will develop tools and automation to support large-scale model training on the latest AMD GPUs. You’ll work closely with engineers across teams to optimize training workloads, manage CI/CD pipelines, and ensure reliable, high-performance releases. This is a hands-on engineering position with a strong focus on distributed systems, performance, and automation at scale.

THE PERSON:

The ideal candidate brings deep experience in open-source software (OSS) release cycles, container-based packaging (e.g., Docker), and has strong debugging skills—particularly around model training workloads. You thrive in fast-paced environments and are passionate about automation, system reliability, and continuous improvement.

KEY RESPONSIBILITIES:

• Manage and maintain nightly builds for multiple training frameworks
• Collaborate on integrating new training workloads and expanding test coverage
• Ensure the stability and releasability of the main branch at all times
• Update and maintain build processes to support biweekly release and performance goals
• Handle and deliver ad-hoc development test builds as requested
• Track build performance and reliability metrics over time

PREFERRED EXPERIENCE:

• Experience with open-source software contributions and release management
• Strong hands-on experience with Docker and container-based workflows
• Excellent problem-solving skills and attention to detail
• Ability to work independently and a willingness to learn new technologies quickly

ACADEMIC CREDENTIALS:

• Bachelor’s degree in Computer Science, Engineering, or a related technical field

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.