Overview
USD 125,370.00 per year
Full Time
Skills
Embedded Systems
Innovation
Microsoft Excel
Systems Engineering
High Availability
Scalability
Collaboration
Data Science
Research
Training
Grafana
Identity Management
Workflow
Debugging
Incident Management
Computer Hardware
Scripting
Python
Bash
Linux
GPU
CPU
Kubernetes
Orchestration
Ansible
Terraform
Continuous Integration
Continuous Delivery
Machine Learning Operations (ML Ops)
Stacks Blockchain
Machine Learning (ML)
Artificial Intelligence
HPC
Management
Distributed File System
Computer Networking
TCP/IP
VLAN
Firewall
Privacy
Regulatory Compliance
Communication
Documentation
Real-time
Computer Science
Computer Engineering
Electrical Engineering
Military
Law
Recruiting
Job Details
WHAT YOU DO AT AMD CHANGES EVERYTHING
We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences - the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world's most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.
AMD together we advance_
THE ROLE:
AMD is seeking a driven and collaborative MLOps Engineer to join our Engineering Operations team in Atlanta. You will support and optimize large-scale, multi-GPU/CPU ML infrastructure to enable world-class AI and rendering research. Collaborating with teams across North America and Europe, you will design robust, automated pipelines and help push the boundaries of machine learning and high-performance compute in a production data center environment.
THE PERSON:
You are a hands-on engineer passionate about both machine learning operations and large-scale infrastructure. You excel at collaborating with researchers and IT specialists, drive automation, and enjoy solving complex technical challenges at the intersection of data science and systems engineering.
KEY RESPONSIBILITIES:
PREFERRED EXPERIENCE:
ACADEMIC CREDENTIALS:
Computer Science, Computer Engineering, Electrical Engineering, or closely related field.
Location: Atlanta GA Data Center (Onsite)
#LI-CS1
Benefits offered are described: AMD benefits at a glance.
AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.
We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences - the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world's most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.
AMD together we advance_
THE ROLE:
AMD is seeking a driven and collaborative MLOps Engineer to join our Engineering Operations team in Atlanta. You will support and optimize large-scale, multi-GPU/CPU ML infrastructure to enable world-class AI and rendering research. Collaborating with teams across North America and Europe, you will design robust, automated pipelines and help push the boundaries of machine learning and high-performance compute in a production data center environment.
THE PERSON:
You are a hands-on engineer passionate about both machine learning operations and large-scale infrastructure. You excel at collaborating with researchers and IT specialists, drive automation, and enjoy solving complex technical challenges at the intersection of data science and systems engineering.
KEY RESPONSIBILITIES:
- Architect, deploy, and maintain high-availability Linux/GPU/CPU server clusters for ML workloads, ensuring optimal performance, security, and scalability.
- Collaborate cross-functionally with data science, research, and IT teams (across North America and Europe) to streamline ML model training, test, deployment, and monitoring pipelines.
- Build and automate end-to-end CI/CD workflows for ML (using MLflow, DVC, Kubeflow, Airflow, or similar tools).
- Configure, monitor, and optimize large-scale NAS and data transfer for sharing of models, datasets, and training results.
- Proactively monitor infrastructure and application health (using Prometheus, Grafana, or similar), addressing performance bottlenecks, failures, and incidents.
- Implement robust security, user management, and access protocols in line with international compliance (GDPR, etc.).
- Document processes, workflows, and troubleshooting guides for global teams; support remote debugging and rapid incident response.
- Stay abreast of trends in AI infrastructure, MLOps toolchains, and AMD hardware accelerators.
PREFERRED EXPERIENCE:
- Strong programming/scripting background (Python, Bash, or Go), and proven experience with Linux server administration.
- Practical experience managing GPU/CPU clusters and Kubernetes orchestration.
- Experience with infrastructure automation (Ansible, Terraform) and CI/CD pipeline design.
- Familiarity with MLOps stacks (MLflow, DVC, Kubeflow, Flyte, Airflow).
- Monitoring and troubleshooting distributed workloads for ML/AI, HPC, or rendering.
- Experience configuring and managing NAS or other distributed file systems for large data.
- Knowledge of networking (TCP/IP, VLANs, firewalls), data privacy, and compliance.
- Strong communication, troubleshooting, and documentation skills.
- Previous exposure to supporting render farms or real-time graphics pipelines is a plus.
ACADEMIC CREDENTIALS:
Computer Science, Computer Engineering, Electrical Engineering, or closely related field.
Location: Atlanta GA Data Center (Onsite)
#LI-CS1
Benefits offered are described: AMD benefits at a glance.
AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.