AI Infrastructure Engineer

San Jose, CA, US • Posted 13 hours ago • Updated 13 hours ago
Contract W2
Contract Independent
Contract Corp To Corp
12 Months
No Travel Required
On-site
Depends on Experience
Fitment

Dice Job Match Score™

🔢 Crunching numbers...

Job Details

Skills

  • (Prometheus OR Grafana OR ELK OR Stack OpenTelemetry)
  • (Terraform OR Nutanix Calm)
  • GPU Infrastructure
  • Kubernetes
  • Nutanix

Summary

Job Title: AI Infrastructure Engineer (Nutanix AI Platform)

Location: San Jose, CA (Hybrid/Onsite Preferred)

Duration: 12 months+ Contract

Experience: 10+ Years
Rate:
Negotiable


Position Overview

We are seeking a highly experienced AI Infrastructure Engineer to architect, deploy and optimize enterprise-scale AI infrastructure solutions leveraging the Nutanix ecosystem. The ideal candidate will have deep expertise in Nutanix Cloud Infrastructure (NCI), AOS/AHV, Nutanix Kubernetes Platform (NKP), GPU-accelerated computing and hybrid cloud environments.

This role focuses on building scalable, high-performance infrastructure that supports Large Language Models (LLMs), Generative AI workloads, AI training and AI inference platforms across on-premises and cloud environments.

The selected candidate will serve as a Subject Matter Expert (SME) for Nutanix AI infrastructure and will work closely with architecture, cloud, platform, networking and security teams.


Required Skills (Must Have)

Nutanix Platform Expertise

  • Strong hands-on experience with:

    • Nutanix Cloud Infrastructure (NCI)

    • Nutanix AOS (Acropolis Operating System)

    • Nutanix AHV (Acropolis Hypervisor)

    • Nutanix Cloud Manager (NCM)

    • Nutanix Flow

    • Nutanix Objects and Files

Kubernetes & Container Platforms

  • Extensive experience with:

    • Nutanix Kubernetes Platform (NKP)

    • Kubernetes cluster deployment and administration

    • Container orchestration and workload management

    • AI/ML workload deployment in Kubernetes environments

GPU & AI Infrastructure

  • Experience designing and managing GPU-enabled environments

  • Hands-on experience with:

    • NVIDIA GPU ecosystem (A100, H100, CUDA, GPU Passthrough, vGPU)

    • AMD GPU ecosystem

  • Experience supporting AI model training and inference workloads

Infrastructure Automation

  • Terraform

  • Infrastructure as Code (IaC)

  • Nutanix Calm

  • Automated provisioning and lifecycle management

Monitoring & Observability

  • Prometheus

  • Grafana

  • ELK Stack

  • OpenTelemetry

  • Monitoring, logging, alerting and performance tuning


Key Responsibilities

AI Infrastructure Architecture

  • Design and implement scalable AI infrastructure platforms using Nutanix technologies.

  • Build optimized environments supporting Generative AI, LLM training and inference workloads.

  • Design high-performance compute, storage and networking architectures for AI applications.

Hybrid Cloud & Multicloud Solutions

  • Architect hybrid cloud solutions leveraging Nutanix Cloud Clusters (NC2).

  • Enable seamless workload portability between on-premises environments and public cloud platforms.

  • Support cloud bursting and dynamic workload scaling.

Kubernetes Platform Engineering

  • Deploy, manage and optimize AI workloads on Nutanix Kubernetes Platform (NKP).

  • Design highly available and resilient containerized environments.

  • Implement workload automation and orchestration best practices.

Storage & Data Services

  • Design high-performance storage solutions using Nutanix Objects and Nutanix Files.

  • Optimize storage architectures for AI/ML datasets and model repositories.

  • Ensure data availability, scalability and performance.

Security & Networking

  • Implement Zero-Trust security principles.

  • Utilize Nutanix Flow for micro-segmentation and workload security.

  • Collaborate with security and networking teams to protect sensitive AI data.

Performance Optimization

  • Optimize GPU utilization and AI infrastructure performance.

  • Configure GPU Passthrough and vGPU environments.

  • Improve resource efficiency, scalability and operational costs.

Observability & Reliability

  • Establish enterprise monitoring, logging and alerting frameworks.

  • Ensure high availability, disaster recovery and fault tolerance.

  • Perform root cause analysis and capacity planning.


Preferred Qualifications

  • Experience supporting AI/ML platforms, LLMs and Generative AI initiatives.

  • Experience with AI model serving frameworks and inference platforms.

  • Knowledge of MLOps, AI platform engineering and data pipelines.

  • Experience working in enterprise-scale hybrid cloud environments.

  • Nutanix certifications highly preferred.


Required Experience

  • 10+ years of Infrastructure, Cloud, Platform Engineering, or Architecture experience.

  • Strong enterprise-level Nutanix experience.

  • Experience supporting Kubernetes-based production environments.

  • Experience with GPU-enabled infrastructure and AI workloads.

 

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 90773860
  • Position Id: 3129-4981-
  • Posted 13 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

San Jose, California

2d ago

Easy Apply

Contract, Third Party

$55 - $60

San Jose, California

27d ago

Easy Apply

Contract

$70 - $77

Hybrid in Santa Clara, California

Yesterday

Easy Apply

Contract

Depends on Experience

Santa Clara, California

Today

Easy Apply

Third Party, Contract

Search all similar jobs