HPC-Kubernetes Solutions Architect

Overview

On Site
BASED ON EXPERIENCE
Full Time

Skills

High Performance Computing
Training
Customer Facing
Accountability
Requirements Analysis
Optimization
Product Engineering
Strategist
Innovation
Oracle Policy Automation
SAN
Data Storage
IBM GPFS
Ceph
Grafana
Continuous Integration
Continuous Delivery
Collaboration
DevOps
Scalability
Leadership
Onboarding
Cisco
Emerging Technologies
Roadmaps
MIG
RBAC
Storage
Computer Networking
InfiniBand
Remote Direct Memory Access
Machine Learning (ML)
Use Cases
Python
Benchmarking
Performance Tuning
Customer Engagement
Presentations
HPC
Management
Open Source
GPU
Orchestration
Computer Science
Physics
Kubernetes
Cloud Computing
Amazon Web Services
Microsoft Azure
FOCUS
Professional Services
Genetics
Law
Privacy
Artificial Intelligence

Job Details

Title: HPC Kubernetes Solutions Architect
Location: Dallas, TX
Duration: Permanent Position
Compensation: $200,000 - $350,000/year
Work Requirements: , Holders or Authorized to Work in the U.S.

HPC Kubernetes Solutions Architect

  • As an HPC Kubernetes Solutions Architect, you will act as a trusted advisor to customers, guiding them through the design, integration, and adoption of GPU-accelerated Kubernetes platforms purpose-built for high-performance computing (HPC), AI/ML training, simulation, and scientific workloads.
  • This is a customer-facing architecture role with accountability across the entire solution lifecycle from early discovery and requirements analysis, through reference architecture design, proof-of-concept delivery, and deployment, to long-term optimization and platform evolution.
  • You will be responsible for creating architectural blueprints and integration strategies that enable customers to achieve measurable performance and scalability outcomes, while preparing them for future growth and technology shifts.
  • In addition, you will collaborate closely with product, engineering, and operations teams, ensuring customer feedback informs roadmap priorities and helping define the next generation of Kubernetes-based HPC orchestration.
  • This role is ideal for someone who combines deep technical expertise in Kubernetes and GPU orchestration with the ability to engage customers as a solution strategist, aligning today's workloads with tomorrow's innovation.

Responsibilities:

  • Act as the primary architectural point of contact for customers adopting GPU-accelerated Kubernetes platforms for HPC and AI/ML workloads.
  • Partner with customers to capture workload requirements, performance objectives, scaling needs, and integration constraints, translating them into reference architectures and actionable solution designs.
  • Architect and operate Kubernetes clusters optimized for GPU workloads, leveraging NVIDIA
  • Integrate and tune Multi-Instance GPU (MIG), GPU sharing, and scheduler extensions (e.g., Volcano, Slurm integration, kube-scheduler plugins) to maximize efficiency in multi-tenant environments.
  • Develop or extend custom Kubernetes operators and controllers in Go/Python to automate HPC infrastructure services.
  • Design and recommend secure multi-tenant Kubernetes environments, implementing RBAC, OPA/Gatekeeper policies, namespace isolation, and workload quotas.
  • Lead proof-of-concept and benchmarking engagements, using profiling tools, workload characterization, and telemetry to validate solution performance and scalability.
  • Define and document integration strategies across compute, storage, networking, and orchestration layers, including CNI plugins (NVIDIA CNI, Multus, Cilium), storage systems (Lustre, GPFS, Ceph, VAST), and container runtimes (containerd, NVIDIA Container Toolkit).
  • Drive observability and monitoring solutions with Prometheus, Grafana, DCGM Exporter, and OpenTelemetry, ensuring visibility into GPU health, cluster utilization, and workload performance.
  • Support GitOps-driven CI/CD pipelines for Kubernetes infrastructure using ArgoCD, FluxCD, Helm, and Kustomize.
  • Collaborate with HPC, ML, and DevOps teams to validate performance and scalability in hybrid or on-premise environments.
  • Provide architectural leadership during onboarding and deployment, ensuring successful integration of Kubernetes clusters with HPC schedulers and enterprise IT systems.
  • Build and maintain strategic relationships with ecosystem vendors (e.g., NVIDIA, Cisco, storage partners), incorporating emerging technologies into customer environments.
  • Share future insights with customers on GPU roadmaps, interconnect advancements (e.g., InfiniBand, RoCE, NVLink), and container orchestration trends.
  • Represent the organization in customer design sessions, technical workshops, and industry conferences, positioning yourself as a thought leader in Kubernetes for HPC.

Required Skills:

  • Extensive experience in Kubernetes architecture and operations for HPC or GPU-intensive environments.

Strong technical expertise in:

  • NVIDIA GPU stack (GPU Operator, device plugins, MIG, NVML, DCGM).
  • Kubernetes internals (CRDs, RBAC, scheduler extensions, custom operators/controllers).
  • Distributed and parallel storage integration with Kubernetes for HPC workloads.
  • High-performance networking (InfiniBand, RDMA, RoCE) in containerized environments.
  • Proven ability to design scalable, secure, and resilient Kubernetes-based architectures for HPC and AI/ML use cases.
  • Proficiency in Go or Python for Kubernetes operator or controller development.
  • Experience with workload profiling, benchmarking, and performance tuning.
  • Strong customer engagement skills, capable of translating requirements into actionable architectures and presenting solutions effectively.
  • Collaborative mindset with experience working across engineering, product, and operations teams.

Preferred Experience:

  • Demonstrated success in end-to-end customer solution delivery, from requirements discovery to deployment and adoption.
  • Familiarity with containerized HPC environments (e.g., Singularity/Apptainer).
  • Exposure to automation and GitOps practices for Kubernetes platform management (e.g., ArgoCD, FluxCD).
  • Contributions to open-source projects in the Kubernetes or NVIDIA ecosystem.
  • Experience advising on future adoption strategies, helping customers prepare for emerging GPU, interconnect, and orchestration technologies.
  • Bachelor's or Master's degree in Computer Science, Engineering, Physics, or related technical field.
  • Relevant Kubernetes and container certifications such as CKA, CKAD, or CKS, alongside cloud certifications like AWS Solutions Architect or Azure Solutions Architect Expert.

About INSPYR Solutions
Technology is our focus and quality is our commitment. As a national expert in delivering flexible technology and talent solutions, we strategically align industry and technical expertise with our clients' business objectives and cultural needs. Our solutions are tailored to each client and include a wide variety of professional services, project, and talent solutions. By always striving for excellence and focusing on the human aspect of our business, we work seamlessly with our talent and clients to match the right solutions to the right opportunities. Learn more about us at inspyrsolutions.com.

INSPYR Solutions provides Equal Employment Opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability, or genetics. In addition to federal law requirements, INSPYR Solutions complies with applicable state and local laws governing nondiscrimination in employment in every location in which the company has facilities.

Information collected and processed through your application with INSPYR Solutions (including any job applications you choose to submit) is subject to INSPYR Solutions Privacy Policy and INSPYR Solutions AI and Automated Employment Decision Tool Policy: . By submitting an application, you are consenting to being contacted by INSPYR Solutions through phone, email, or text.


Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About INSPYR Solutions