AI Site Reliability Engineer - W2 REMOTE

Overview

Remote
$40 - $45
Full Time

Skills

Ansible
C++
Golang
HPC
IBM
GitLab
Jenkins
Kubernetes
Linux
Operating Systems
Programming Languages
Python
Terraform

Job Details

Title:AI Site Reliability Engineer
Location: 100% Remote
Any visa is fine
Requirements include:
  • Experience deploying and administering NVIDIA (DGX) or equivalent high-performance-compute (HPC) clusters (e.g. Cray, HPE, IBM).
  • 5+ years administering and supporting Linux based operating systems.
  • Experience writing code in general-purpose programming languages such as: Python, GoLang, C/C++ and using GIT and CI/CD systems (e.g., GitLab, GitHub Actions, Jenkins).
  • Experience in deploying Enterprise Grade Kubernetes cluster (RedHat OpenShift preferred) and/or Google Anthos.
  • Sophisticated knowledge of Kubernetes, Dockers, Terraform, Ansible, Jenkins, GitOps, Git, Linux
  • Software development lifecycle includes design, development, testing, packaging, deployment using Python or Golang
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.