High-Performance Computing (HPC) Engineer

Overview

Full Time

Skills

High Performance Computing
Application Development
Innovation
Computer Hardware
Testing
Documentation
Training
Status Reports
Collaboration
Network
Storage Engineering
Computer Science
Management
LSF
FEA
GPU
Linux Administration
InfiniBand
Remote Direct Memory Access
MPI
Machine Learning (ML)
Algorithms
Artificial Intelligence
Computer Networking
Deep Learning
PyTorch
TensorFlow
Configuration Management
Ansible
Cobbler
Puppet
HPC
Virtualization

Job Details

As a High-Performance Computing (HPC) engineer on Apple's Hardware Methodologies, Tools, & Solutions (HMTS) Platform team, you will serve as a vital connector between HPC infrastructure, Application development, operations, and Engineers. Your contributions will be key to maintaining the exceptional design environment for hardware engineering, supporting Apple's commitment to leading innovation in hardware.

Description In this role, you will be responsible for supporting, testing, and deploying HPC infrastructure products at our operations' core. You will help plan, code, build, test, deploy, operate, and monitor our Infrastructure-as-Code solutions for HPC server infrastructure. Your responsibilities will include: Demonstrating strong troubleshooting skills by independently identifying and resolving issues. Monitor system performance and availability, and remediate issues as necessary. Develop automation for common development and operational tasks. Maintaining clear, current documentation of system configurations, including creating detailed justifications, training materials for complex topics, status reports, and procedural guides. Collaborate with Application, infrastructure, network, and storage engineering teams to find balanced solutions to engineering problems. Assessing future capacity requirements and evaluating new product features or enhancements.

Minimum Qualifications
  • A Bachelor's degree in Computer Science with at least 5 years of relevant experience or equivalent professional background.
  • Proven experience in an HPC support role in an enterprise environment with 500+ node clusters.
  • Experience deploying and managing schedulers such as SLURM, LSF, and/or NC.
  • Deploying and configuring FEA Solvers to run on HPC
  • Experience with NVIDIA GPU compute.
  • Strong Linux administration skills.
  • Experience with InfiniBand-including IBoIP and RDMA

Preferred Qualifications
  • Experience with multiple flavors of MPI
  • Experience with machine learning and deep learning concepts, algorithms, and models.
  • Background in Software Defined Networking and AI/HPC cluster networking.
  • Familiarity with deep learning frameworks such as PyTorch and TensorFlow.
  • Experience with automation and configuration management tools like Ansible, Cobbler & Puppet.
  • Experience developing and securing containerized applications and HPC environments beneficial (e.g., Apptainer).
  • Experience with virtualization technologies is beneficial

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant .
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.