R&D Infrastructure Research Engineer

Overview

Remote
$120,000 - $150,000
Full Time

Skills

CI
CD
GPU
infrastructure
HPC
servers
storage
network
databases
containers
compute
Kubernetes
Distributed Systems
Python
systems tools
K8s
rancher
kubeflow

Job Details

  • Strong knowledge of infrastructure research including, specifically GPU computing performance. (5-7 years of infrastructure research experience).
  • Expert in Kubernetes, containers, Distributed Systems, Python, systems tools, and scripts programming, with 10+ years of hands-on experience in these areas.
  • Excellent knowledge of AI infrastructure optimization, debugging, and tuning skills.
  • Analyze benchmarking data and draw/recommend optimization insights.
  • Responsible for Design, implement, and manage robust infrastructure solutions that meet benchmark requirements.
  • Hands-on experience in Continuous Integration and Continuous Delivery (CI/CD)
  • Strong skills to evaluate the information/data and Performance fine tune servers, compute (CPU/GPU), network, and databases to optimize performance.
  • Build, install and maintain containers environment including K8s, rancher, kubeflow, etc.
  • Expertise in architecting, building and managing large R&D data sets and Implementing High Performance Computing (HPC)
  • Ability to troubleshoot AI infrastructure including servers/GPUs, network, and storage.
  • Must conduct comprehensive benchmarking to evaluate system performance, reliability, and scalability.
  • Deep understanding in overall technology architecture and deep multidisciplinary experience including servers, storage, network, databases, containers, compute (CPU/GPU)
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.