Overview
Skills
Job Details
Software Development Engineer, Release
Open to candidates in San Jose, CA or Fully Remote is OK
12 months contract
Must Have Skills:
• Release engineering & CI/CD at scale
• Containerization & reproducible builds (Expert Docker workflows (multi-stage builds, caching, multi-arch)
• Build & test automation for distributed ML workloads
• Strong debugging + scripting
THE ROLE:
We are seeking a skilled and motivated Software Development Engineer to join our Training at Scale team. In this role, you will develop tools and automation to support large-scale model training on the latest AMD GPUs. You’ll work closely with engineers across teams to optimize training workloads, manage CI/CD pipelines, and ensure reliable, high-performance releases. This is a hands-on engineering position with a strong focus on distributed systems, performance, and automation at scale.
THE PERSON:
The ideal candidate brings deep experience in open-source software (OSS) release cycles, container-based packaging (e.g., Docker), and has strong debugging skills—particularly around model training workloads. You thrive in fast-paced environments and are passionate about automation, system reliability, and continuous improvement.
KEY RESPONSIBILITIES:
• Manage and maintain nightly builds for multiple training frameworks
• Collaborate on integrating new training workloads and expanding test coverage
• Ensure the stability and releasability of the main branch at all times
• Update and maintain build processes to support biweekly release and performance goals
• Handle and deliver ad-hoc development test builds as requested
• Track build performance and reliability metrics over time
PREFERRED EXPERIENCE:
• Experience with open-source software contributions and release management
• Strong hands-on experience with Docker and container-based workflows
• Excellent problem-solving skills and attention to detail
• Ability to work independently and a willingness to learn new technologies quickly
ACADEMIC CREDENTIALS:
• Bachelor’s degree in Computer Science, Engineering, or a related technical field