AI Performance Architect, Training and Inference

Overview

On Site
USD 198,100.00 per year
Full Time

Skills

Embedded Systems
Management
Data Centers
Use Cases
Innovation
Cloud Computing
Computer Hardware
Healthcare Information Technology
C++
Python
PyTorch
TensorFlow
Training
Computer Science
Computer Engineering
Open Source
Collaboration
Publications
Machine Learning (ML)
Analytical Skill
Conflict Resolution
Problem Solving
ROOT
Artificial Intelligence
OpenMP
MPI
Performance Analysis
CPU
GPU
Docker
Kubernetes
Communication
Leadership
Military
Law
Recruiting

Job Details

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

THE ROLE:

Would you like to be part of a world class team enabling software for world class datacenters and the mightiest supercomputers? AMD is searching for talented and highly motivated AI Software Engineers to join our team of developers pushing the boundaries of efficiency and performance to enable and optimize the software ecosystem for the next generation of GPU computational accelerators. Our team has an unparalleled perspective of the AI landscape. We work with the industry's most sophisticated clients to help them leverage the latest hardware capabilities for their AI use cases. As part of our team, you will be among the first in the world to combine the newest hardware with the industry's latest applications, libraries, frameworks, and SDKs to push the limits of innovation and solve the world's most complex challenges. Minimum 7 years of experience required.

THE PERSON:

We are looking for a highly motivated and skilled AI Software Engineer to join our team. You will work with a team of Software Engineers to enable DL models, libraries, and applications for Instinct GPUs in both on-prem and Cloud environments. Candidates should be strong in Python and/or C++. Candidates should also have experience analyzing and optimizing the performance of AI software and understand hardware bottlenecks and harness performance to hit close to roofline. Must be self-motivated and possess the ability to work well within a team environment.

KEY QUALIFICATIONS:
  • Strong programming skills in C++ and Python
  • Strong development experience is at least one major DL framework such as Pytorch or Tensorflow in inference, fine tuning and/or training
  • MS with years of related experience or PhD with years of related experience in Computer Science or Computer Engineering or related equivalent.
  • Experience developing software and system-level performance optimizations with a solid architecture understanding in GPUs a plus

  • Experience with open-source software development including collaboration with community maintainers and submitting contributions is a plus
  • Publications in reputed peer-reviewed ML conferences/journals a plus
  • Excellent analytical and problem-solving skills root-causing/addressing performance issues.
  • Ability to work independently and as part of a team.
  • Willingness to learn skills, tools, and methods to advance the quality, consistency, and timeliness of AMD software products.

PREFERRED EXPERIENCE:
  • Expertise in profiling tools across the AI SW Stack (Torchprofiler, RocM profiler, Vtune, Nsight)
  • Experience in implementing and optimizing parallel methods on GPU accelerators (NCCL/RCCL, OpenMP, MPI)
  • Performance analysis skills for both CPU and GPU
  • Experience with Singularity, Docker, and/or Kubernetes.
  • Experience providing clear and timely communication related to status and other key aspects of the project to leadership team.

LOCATION: Santa Clara, CA area

#LI-RL1

Benefits offered are described: AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.