Platform Architect- 15+ years Of Experience Required


V-CENTRIX-US LLC
Dice Job Match Score™
⭐ Evaluating experience...
Job Details
Skills
- Artificial Intelligence
- GPU
- Machine Learning Operations (ML Ops)
- NVIDIA DGX
- Grafana
- Kubernetes
- Enterprise Software
- Computer Networking
- InfiniBand
- Scripting
- Ansible
- Bash
- CPU
- Certified Kubernetes Administrator (CKA)
- Certified Kubernetes Application Developer (CKAD)
- Certified Kubernetes Security Specialist (CKS)
- NVIDIA Certified Associate: AI Infrastructure & Operations (NCA-AIIO)
- NVIDIA Certified Professional: AI Infrastructure (NCP-AII)
- NVIDIA Certified Professional: AI Operations (NCP-AIO)
- NVIDIA Certified Professional: AI Networking (NCP-AIN)
Summary
Position : AI Infrastructure and Kubernetes Platform Architect
Location : Remote
Duration: 15+ months (Remote)
Type : Contract
Job Description:
We are seeking a highly skilled AI Infrastructure and Kubernetes Platform Architect with deep expertise in managing GPU-accelerated workloads on NVIDIA DGX systems. The ideal candidate will have hands-on experience with Kubernetes at the administrator, application developer, and security levels (CKA, CKAD, CKS), and will be responsible for designing, deploying, securing, and maintaining large-scale AI infrastructure powered by DGX BasePODs and SuperPODs. This role involves optimizing AI workloads, managing high-performance networking (InfiniBand), and ensuring operational excellence across NVIDIA AI systems and BlueField DPU environments.
Key Responsibilities:
Kubernetes and AI Platform Orchestration
- Architect and maintain containerized AI/ML platforms using Kubernetes on DGX systems.
- Integrate NVIDIA Base Command Manager with Kubernetes for workload scheduling and GPU resource optimization.
- Design multi-tenant GPU resource partitioning strategies using MIG (Multi-Instance GPU) to maximize hardware utilization across concurrent AI workloads.
- Implement and manage Helm charts, custom controllers, and GPU operators for scalable ML infrastructure.
DGX Infrastructure Administration
- Administer and optimize NVIDIA DGX BasePODs and SuperPODs.
- Ensure optimal GPU, CPU, and storage performance across AI clusters.
- Leverage DGX System Administration best practices for lifecycle management and updates.
- Coordinate capacity planning for DGX cluster expansion including rack power, cooling, and storage integration with NVIDIA AI Enterprise software stack.
High-Performance Networking & DPU
- Deploy, monitor, and manage InfiniBand networks using Unified Fabric Manager (UFM).
- Integrate BlueField DPUs for offloaded security, networking, and storage tasks.
- Optimize end-to-end data pipelines from storage to GPUs.
Security and Compliance
- Apply best practices from the CKS certification to harden Kubernetes clusters and AI workloads.
- Implement secure service mesh and microsegmentation with BlueField DPU integration.
- Conduct regular audits, vulnerability scanning, and security policy enforcement.
Automation & Monitoring
- Automate deployment pipelines and infrastructure provisioning with IaC tools (Terraform, Ansible).
- Monitor performance metrics using GPU telemetry, PrometheGrafana, and NVIDIA DCGM.
- Troubleshoot and resolve complex system issues across hardware and software layers.
- Implement MLOps workflows integrating KubeFlow Pipelines, NVIDIA Triton Inference Server, and model registry tooling to support end-to-end model training and production deployment.
Required Skills and Qualifications:
- CKA, CKAD, CKS certifications – demonstrating full-stack Kubernetes expertise.
- Proven experience with NVIDIA DGX systems and AI workload orchestration.
- Hands-on expertise in InfiniBand networking, UFM, and BlueField DPU administration.
- Strong scripting and automation skills in Python, Bash, YAML.
- Familiarity with Base Command Manager, NVIDIA GPU Operator, and KubeFlow is a plus.
- Ability to work across teams to support ML researchers, DevOps engineers, and infrastructure teams.
Thanks & Regards
Adarsh Kumar Tripathi
Sr. Talent Acquisition Specialist
Vcentrix Services US LLC
8 The Green, Suite B, Dover, DE 19901, USA
Mail:
- Dice Id: 91172987
- Position Id: 8924082
- Posted 5 hours ago
Company Info
Welcome to VCentrix Services – where innovation meets performance.
At VCentrix, we empower businesses to thrive in the modern digital economy by providing a seamless blend of cutting-edge IT solutions and results-driven digital marketing. Our mission is to be more than just a service provider; we act as a dedicated extension of your team, helping you optimize technology and amplify your brand’s online presence.
What We Do:
We specialize in delivering high-impact solutions across several core domains:
* Digital Marketing: Strategic SEO, PPC, Social Media Management, and performance marketing to drive measurable growth.
* Web & Mobile Development: Designing and building scalable, user-centric websites and mobile applications (iOS & Android).
* Virtual Employee Services: Providing dedicated remote professionals to help your business stay organized and scale efficiently without high overhead costs.
* IT & Cyber Security: Ensuring your digital assets are secure, high-performing, and future-ready.
* Creative Services: Professional content writing, animation, and multimedia solutions that capture your brand’s voice.
Why Choose VCentrix?
With a global footprint in the USA and India, we combine international standards with competitive agility. Our approach is rooted in clear communication, proactive support, and a commitment to data security and confidentiality. Whether you are a startup looking for your first digital footprint or an established brand seeking to optimize your operations, VCentrix is here to help you grow.


Similar Jobs
It looks like there aren't any Similar Jobs for this job yet.
Search all similar jobs