AI Infrastructure Engineer

Overview

Remote
$80 - $90
Accepts corp to corp applications
Contract - W2
Contract - Independent
Contract - 6 Month(s)

Skills

AI infrastructure
high-performance computing (HPC) environments
NVIDIA Mission Control
RunAI

Job Details

This role focuses on managing and optimizing our AI infrastructure, ensuring seamless operations, and providing guidance and training to our team members. The ideal candidate will have hands-on experience with AI operations, infrastructure management, and a strong understanding of high-performance computing (HPC) environments. This position emphasizes operational excellence and team education rather than strategic development or workload definition.

Key Responsibilities:

  • Manage and maintain AI infrastructure, ensuring high availability and performance.
  • Implement and optimize AI operations using tools like NVIDIA Mission Control and RunAI.
  • Collaborate with cross-functional teams to support AI workloads and ensure efficient resource utilization.
  • Provide training and mentorship to team members on AI infrastructure tools and best practices.
  • Monitor system performance and troubleshoot issues to minimize downtime and optimize resource allocation.
  • Assist in the deployment and scaling of AI models and applications.
  • Stay updated with the latest advancements in AI infrastructure technologies and recommend improvements.
  • Document processes, configurations, and best practices for AI infrastructure management.

Required Skills and Qualifications:

  • Proven experience in managing AI infrastructure and operations.
  • Proficiency with NVIDIA Mission Control/Bright Cluster Manager and Run: AI.
  • Proficiency with Linux Operation Systems such as Ubuntu, RHEL.
  • Strong understanding of high-performance computing (HPC) environments.
  • Experience with cloud platforms and on-premises infrastructure.
  • Excellent problem-solving skills and attention to detail.
  • Ability to work collaboratively in a team environment and communicate effectively.
  • Experience in training and mentoring technical teams.
  • Bachelor s degree in Computer Science, Engineering, or a related field, or equivalent experience.

Preferred Qualifications:

  • Experience with containerization technologies such as Docker and Kubernetes.
  • Familiarity with AI frameworks and libraries (e.g., TensorFlow, PyTorch).
  • Knowledge of network and storage solutions for AI workloads.
  • Familiarity with job scheduling such as SLURM.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Cloud Destinations LLC