Solution Architect

Spartanburg, SC, US • Posted 21 hours ago • Updated 1 hour ago

Contract Corp To Corp

Contract W2

On-site

Fitment

Dice Job Match Score™

📊 Calculating match score...

Job Details

Skills

DevOPS
Linux
RHEL
Dell XE
HPC operations
NVIDIA GPU

Summary

Role:Solution Architect

Duration : 13 Months

Location: Spartanburg, South Carolina (100% onsite for the first 6 months)

Must have skills
Strong experience with HPC operations in production environments (scheduling, monitoring, troubleshooting, capacity management).
Hands on expertise with NVIDIA GPU infrastructure (preferably GB200/GB300 class racks, H100/B100 or similar generations) including firmware/driver stack, monitoring, and lifecycle management.
Familiarity with liquid cooled data center infrastructure: working safely around cold plate / rear door heat exchanger systems, understanding facility interfaces (CDU, manifolds, leak detection, etc.).
Solid understanding of Linux administration (RHEL/CentOS/Ubuntu) in HPC/AI clusters: OS provisioning, patching, performance tuning, and troubleshooting.
Experience with cluster management tools (e.g., Bright, OpenHPC, Slurm, PBS/LSF, Kubernetes based AI stacks, or equivalent schedulers and orchestration frameworks).
Strong Day 2 operations mindset: incident response, root-cause analysis, change management, and operational runbook creation.
Excellent customer facing communication skills and the ability to work on site, embedded with the customer's team and their operations staff.

Nice to have skills
Prior experience operating or supporting Dell XE/XE HPC & AI solutions and related management tooling.
Background in HP/HPE HPC environments (familiar with how HP is typically deployed/managed in HPC shops, to ease comparison and migration).
Exposure to AI/ML and GenAI workloads running at scale on NVIDIA GPU infrastructure (MLOps pipelines, model training/inference operations).
Experience with data center facilities coordination (power and cooling planning, rack integration, cabling standards, change windows).
Scripting skills in Python/Bash/PowerShell for automation, reporting, and integration with monitoring/ITSM systems.
Familiarity with ITIL aligned processes (incident, problem, and change management) and documenting runbooks/SOPs.
Strong scripting/automation skills, especially in Python and Bash, for automating operational workflows, health checks, and reporting across large GPU clusters.
Experience building or maintaining infrastructure automation (e.g., Ansible, Terraform, or similar) for repeatable deployment, configuration, and lifecycle management of HPC/AI nodes.
Some scripting skills in Python/Bash/PowerShell for automation, reporting, and integration with monitoring/ITSM systems.
Experience with DevOps / Git based workflows (GitLab/GitHub, CI/CD) to help with scripts, automation playbooks, and configuration as code.

Detailed Project Tasks / Scope
The resident will be a full time, on site operational resource focused on Day 2 operations for liquid cooled Dell XE racks with NVIDIA GB300 based configurations. Scope includes, but is not limited to:

Day 2 operations & stability
Own day to day operational support for the initial 48 fully loaded GB300 racks, scaling to grow toward 144 racks by year end.
Monitor system health, performance, and capacity; proactively identify and remediate issues impacting uptime or SLAs.
Perform incident triage, troubleshooting, and coordination with Dell and NVIDIA support as needed.

Infrastructure management
Support rack level and node level lifecycle tasks: firmware/driver updates, BIOS tuning, OS patching, and configuration consistency.
Assist with liquid cooling operations: safe work practices, coordination with facilities for maintenance/change activity, and monitoring of cooling performance and alarms.
Validate and maintain power, cooling, and space documentation as the environment scales from 48 to 144 racks.

HPC/AI platform operations
Support cluster scheduler / workload manager operations (job queues, resource pools, and performance tuning) in the context of GPU heavy workloads.
Work with to ensure that AI/HPC workloads are efficiently utilizing the GB300 i

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 91088813
Position Id: 2026-1755
Posted 21 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

PowerEdge Server Engineer

Spartanburg, South Carolina

•

Today

Our client, located in Spartanburg, SC, is currently in need of a Dell PowerEdge Server Engineer for a 1 year contract. The consultant will work 100% onsite in support of PowerEdge Rack / Tower servers running RHEL and Ubuntu OS. Responsibilities: Support day to day data center activities Proactively walk the data center through the day, watching/alerting customer of amber lights, "hot" doors etc. Escort dispatched field engineers (when applicable) LOIS parts management (will be trained once o

Contract

USD 55.00 - 75.00 per hour

Data Center Operator Sr. Technician

Spartanburg, South Carolina

•

Today

JLL empowers you to shape a brighter way. Our people at JLL are shaping the future of real estate for a better world by combining world class services, advisory and technology for our clients. We are committed to hiring the best, most talented people and empowering them to thrive, grow meaningful careers and to find a place where they belong. Whether you've got deep experience in commercial real estate, skilled trades or technology, or you're looking to apply your relevant experience to a new in

Full-time

Systems Engineer

Greenville, South Carolina

•

Today

Description We are looking for an experienced Systems Engineer to join our team on a contract basis in Greenville, South Carolina. In this role, you will be responsible for designing, managing, and maintaining systems and network infrastructure to ensure reliability, security, and optimal performance. This position offers a unique opportunity to work with advanced technologies across cloud platforms, virtualization, and networking. Responsibilities: Design and implement enterprise-level network

Easy Apply

Contract

USD 44.00 - 50.00 per hour

Data Center Operator Lead Technician

Spartanburg, South Carolina

•

Today

Full-time

Search all similar jobs