Apply Now

AI Platform Architect – DGX & SuperPOD

Remote • Posted 3 hours ago • Updated 3 hours ago

Contract W2

Contract Independent

Contract Corp To Corp

No Travel Required

Remote

$110 - $120/hr

Fitment

Dice Job Match Score™

🛠️ Calibrating flux capacitors...

Job Details

Skills

NVIDIA Certification

Summary

Title:NVIDIA AI Infrastructure & Kubernetes Platform Engineer (DGX Systems)

Remote

NVIDIA Certification required

We are seeking a highly skilled AI Infrastructure & Kubernetes Platform Engineer with a proven track record in deploying and managing NVIDIA DGX-based AI clusters, orchestrating containerized AI workloads using Kubernetes, and ensuring secure, high-throughput operations across InfiniBand-powered networks. The ideal candidate will hold a combination of Kubernetes certifications (CKA, CKAD, CKS) and NVIDIA certifications (NCA-AIIO, NCP-AIO, NCP-AII, NCP-AIN), coupled with hands-on training in DGX, BlueField, and high-speed network operations.

This position plays a key role in supporting AI/ML infrastructure at scale, enabling efficient training and inference for complex models, and integrating NVIDIA's cutting-edge compute, storage, and fabric solutions with modern DevOps practices.

Core Responsibilities:
AI Infrastructure Operations

Deploy and manage NVIDIA DGX BasePODs and SuperPODs for high-performance AI workloads.
Oversee DGX system lifecycle operations including provisioning, monitoring, firmware upgrades, and capacity planning.
Operate Base Command Manager to manage GPU clusters, schedule workloads, and integrate with MLOps tools.
Perform DGX node health validation, NCCL interconnect testing, and NVLink topology verification following new deployments or hardware changes.

Kubernetes Platform Engineering

Architect secure and scalable Kubernetes clusters optimized for GPU-accelerated workloads using NVIDIA GPU Operator.
Leverage expertise from CKA/CKAD/CKS to develop, deploy, and secure AI applications on Kubernetes.
Implement CI/CD pipelines and GitOps methodologies for deploying and managing ML workflows.

High-Performance Networking & DPUs

Administer InfiniBand networks and BlueField DPUs using Unified Fabric Manager (UFM).
Enable NVLink/NVSwitch performance across GPU nodes and tune fabric configurations for minimal latency and maximum throughput.
Use BlueField for offloading storage, firewalling, and telemetry, enhancing AI workload security and performance.

Security & Compliance

Apply best practices from the CKS certification to secure containerized AI environments.
Configure runtime security, secrets management, network segmentation, and auditing using DPU-enhanced Kubernetes deployments.
Support zero-trust architecture initiatives by enforcing workload identity, RBAC policies, and supply chain integrity across AI container images and model artifacts.

Monitoring, Telemetry & Optimization

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10513292
Position Id: 72261-12895-
Posted 3 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Remote

•

Yesterday

Role OverviewWe are seeking a highly experienced AI Solution Architect to design and deliver enterprise-grade AI/ML and Generative AI solutions at scale. This is a hands-on leadership role focused on building production-ready AI systems, architecting modern GenAI platforms, and enabling engineering teams through robust developer platforms and best practices. You will work at the intersection of AI engineering, platform architecture, and developer experience, driving innovation while ensuring rel

Easy Apply

Contract, Third Party

$80 - $110

Sr. Platform Engineer - Kubernetes

Remote

•

7d ago

Title: Sr. Kubernetes Networking Platform Engineer Location: 100% Remote Description: Job Summary: The Kubernetes Networking Platform Senior Engineer will lead the design, delivery, and operation of networking capabilities across the enterprise Kubernetes platform. This includes critical components such as ingress controllers, service mesh, DNS, and traffic management. This engineer will join a team responsible for building a secure, scalable, and observable networking layer that enables applic

Easy Apply

Contract

75 - 80

AI Architect & Machine Learning Developer for Cloud Native AI Solutions

Remote

•

10d ago

Job Title: AI Architect & Machine Learning Developer for Cloud Native AI Solutions Location: Multiple Locations (Remote or Onsite Options Available) Job Type: Full-Time or Contract Salary: From $180K DOE + Great Benefits Job ID: 7337 Position OverviewSherlockTalent is working with an esteemed client on the forefront of AI innovations to fill this AI Architect & Machine Learning Developer role. The selected individual will design and deploy advanced AI solutions from robust data pipelines to prod

Contract

$180,000

Senior Manager AI Platform Lead

Remote or Bolingbrook, Illinois

•

Today

Job Title: Senior Manager - AI Platform Lead Job Location: Remote/ Bolingbrook, IL Interview: Virtual Job Duration: Long Term Contract We are seeking a hands-on AI Platform Architecture Lead to drive the design, evolution, and scalability of enterprise AI platforms. This role combines strategic leadership with deep technical expertise in AI/ML, cloud architecture and MLOps. You will lead a team of engineers and architects, define platform strategy, and ensure high performance, reliability, an

Easy Apply

Full-time, Part-time, Contract, Third Party

USD 70-75

Search all similar jobs

AI Platform Architect – DGX & SuperPOD

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs