Role: Senior Solution Architect HPC, Cloud-Native Systems (ITAR-Restricted Role)
Location: Remote
Position Overview
We are seeking a high-performance Senior Solution Architect to lead the convergence of traditional High-Performance Computing (HPC) environments with modern cloud-native architectures. This position is designated as ITAR-restricted, requiring candidates that are legally authorized to access and handle U.S. export-controlled technical data.
The architect will design, integrate, and optimize large-scale, containerized, hybrid HPC environments using technologies such as Docker, Mirantis, ELK Stack, and advanced batch schedulers. This role requires deep technical leadership, architectural vision, and hands-on experience supporting mission-critical computational workloads in secure, compliant environments.
Core Responsibilities
1. Architecture & Design
- Architect end-to-end hybrid cloud solutions integrating Mirantis Container Cloud with dedicated HPC clusters.
- Balance performance, elasticity, and compliance requirements across on-prem and cloud environments.
- Produce architecture documentation in adherence with ITAR export-controlled standards and review practices.
2. HPC Orchestration
- Design and implement HPC job scheduling strategies using Slurm, Volcano, LAVA, or similar technologies.
- Support deterministic resource allocation for AI/ML analytics, physics simulations, and scientific workloads.
- Ensure schedulers meet ITAR-restricted workload isolation and audit requirements.
3. Optimization & Performance Tuning
- Apply best practices for high-performance containerization: multi-stage builds, minimal base images, and resource tuning (CPU, GPU, Memory).
- Implement strategies to minimize overhead, ensure stability, and eliminate noisy-neighbor issues.
4. Centralized Observability
- Architect and operate an enterprise-grade ELK Stack (Elasticsearch, Logstash, Kibana) tuned for HPC-scale environments.
- Manage Index Lifecycle Management (ILM) for massive log throughput while preserving traceability for compliance audits.
5. Full-Stack Automation
- Build IaC-driven automation pipelines using Terraform, Ansible, and GitOps workflows.
- Automate deployment of Mirantis Kubernetes Engine (MKE) and integrated HPC schedulers within an ITAR-secured environment.
6. CI/CD Automation
- Implement robust CI/CD workflows using Jenkins, GitLab CI, Argo Workflows, or similar tools.
- Ensure pipelines comply with ITAR policies, including artifact access control, secure registries, and encrypted transport.
7. Hybrid Integration
- Architect integration between Kubernetes and traditional HPC schedulers.
- Enable advanced workloads requiring high-speed interconnects such as InfiniBand, RDMA, or GPU-accelerated clusters.
Required Technical Skills
Containers & Mirantis
- Expertise in Docker Runtime, Mirantis Kubernetes Engine (MKE), and Lens Desktop management.
- Deep experience designing containerized workloads for HPC environments.
HPC Schedulers
- Hands-on experience with Slurm, PBS, or Kubernetes-native batch schedulers such as Volcano.
- Knowledge of hierarchical priority queues, gang scheduling, and resource fairness algorithms.
ELK Stack Mastery
- Strong understanding of Logstash pipeline performance optimization, Elasticsearch sharding strategies, and Kibana visualization design.
Performance Tools
- Experience with NVIDIA Enroot/Pyxis or equivalent technologies supporting near bare-metal container performance.
Security & Compliance
- Implement secure registry solutions, TLS encryption, RBAC, and identity-driven access controls.
- Demonstrated experience supporting compliance frameworks including ITAR, NIST 800-53, or similar.
Experience & Qualifications
Professional Background
- 10+ years in systems architecture or engineering roles.
- 5+ years in HPC, Cloud Infrastructure, or enterprise-scale DevOps environments.
HPC Knowledge
- Understanding of MPI (Message Passing Interface), GPU compute workloads, low-latency networks, and distributed parallel frameworks.
Certifications
Preferred certifications include:
- Certified Kubernetes Administrator (CKA)
- Mirantis Kubernetes certifications
- Relevant security/compliance certifications (a plus)
Cloud Platforms
- Experience with AWS HPC environments (EKS, AWS Batch, EKS for Lustre, EC2 GPU-accelerated instances).