Data Center Operations Engineer

California City, CA, US • Posted 1 day ago • Updated 1 day ago
Contract W2
On-site
Depends on Experience
Fitment

Dice Job Match Score™

🔢 Crunching numbers...

Job Details

Skills

  • Adapter
  • Artificial Intelligence
  • Bash
  • Cabling
  • Collaboration
  • Command-line Interface
  • Communication
  • Computer Hardware
  • Computer Networking
  • Documentation
  • FTP
  • Firmware
  • GPU
  • HPC
  • Hardware Installation
  • High Availability
  • ICMP
  • Incident Management
  • InfiniBand
  • Linux
  • Linux Administration
  • Management
  • Microsoft Windows
  • Migration
  • Network
  • OSI Model
  • Optical Fiber
  • Organizational Skills
  • RAID
  • Repair
  • Routers
  • SLA
  • SMTP
  • Scripting
  • Server Hardware
  • Servers
  • Storage
  • Switches
  • TCP
  • TCP/IP
  • TFTP
  • Testing
  • UDP
  • Video

Summary

Position: Data Center Operations Engineer

Location: California City, CA, USA
Duration: 12+ Months (Contract)
Interview: Video Interview
Visa: Open (As per client requirement)

Job Description:

We are seeking a Data Center Operations Engineer with strong hands-on experience supporting enterprise data center infrastructure, Linux systems, GPU server deployments, and InfiniBand networking. The ideal candidate will have expertise in installing, configuring, troubleshooting, and maintaining data center hardware and infrastructure while supporting HPC/AI environments and ensuring high availability of critical systems.

This role requires excellent troubleshooting skills, experience with GPU cluster deployments, InfiniBand fabrics, Linux administration, networking, and data center operations. The engineer will work closely with infrastructure, operations, and engineering teams to support deployments, maintenance activities, and continuous operational improvements.

Required Skills:

  • 5+ years of experience in Data Center Operations or Infrastructure Engineering.
  • Strong hands-on experience with Linux system administration, troubleshooting, and performance validation.
  • Experience with Linux command-line utilities and Bash/Shell scripting.
  • Hands-on experience deploying and configuring GPU servers in clustered environments.
  • Experience with GPU cluster bring-up, driver installation, and system-level configuration.
  • Strong knowledge of InfiniBand networking, including switch configuration, subnet management, and troubleshooting.
  • Experience performing end-to-end GPU testing in InfiniBand-based clusters.
  • Solid understanding of networking fundamentals, including TCP/IP, OSI Model, ARP, ICMP, TCP, UDP, SMTP, FTP, and TFTP.
  • Experience installing, configuring, and troubleshooting routers, switches, and terminal servers.
  • Hands-on experience with server hardware installation, rack and stack, cabling, CPUs, memory, HDDs, RAID controllers, NICs, and firmware upgrades.
  • Experience with fiber and copper cabling, IP networking, and SAN infrastructure.
  • Experience supporting data center deployments, migrations, hardware refreshes, and expansion projects.
  • Experience using monitoring and alerting tools to identify and resolve infrastructure issues.
  • Experience working with ticketing systems while meeting SLA requirements.
  • Strong documentation skills for operational procedures, system configurations, and technical runbooks.
  • Excellent troubleshooting, communication, and organizational skills.
  • Ability to work in a fast-paced production environment and participate in on-call rotations.

Preferred Skills:

  • Experience supporting HPC, AI, or large-scale GPU environments.
  • Experience with NVIDIA GPU platforms and Mellanox/InfiniBand technologies.
  • Experience with data center monitoring solutions.
  • Experience supporting large-scale data center build-outs and infrastructure refresh programs.
  • Familiarity with automation or scripting for operational tasks.

Responsibilities:

  • Provide operational support for data center deployments, maintenance, and repair activities.
  • Install, configure, test, and maintain Linux servers and GPU infrastructure.
  • Deploy, configure, and validate GPU servers and clustered environments.
  • Perform InfiniBand fabric bring-up, switch configuration, subnet management, and troubleshooting.
  • Install and maintain server hardware, including CPUs, memory, storage, RAID components, and network adapters.
  • Configure and troubleshoot routers, switches, terminal servers, and out-of-band management devices.
  • Perform daily health checks of Linux systems, networking, and infrastructure components.
  • Support data center build-outs, hardware refreshes, migrations, and expansion projects.
  • Coordinate with vendors for hardware installation, diagnostics, replacement, and warranty support.
  • Monitor infrastructure using monitoring and alerting tools, ensuring timely incident resolution.
  • Maintain operational documentation, technical procedures, and runbooks.
  • Participate in incident response, maintenance windows, and on-call support rotations.
  • Collaborate with cross-functional global teams to ensure reliable, secure, and scalable infrastructure operations.

Contact -

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 90773860
  • Position Id: 9011916
  • Posted 1 day ago
Contact the job poster
SY

Saipriya Yethirajula

Recruiter @ NMK Global Inc.
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

No location provided

Today

Full-time

USD 100,000.00 - 140,000.00 per year

Texas

Today

Full-time

USD 13.03 - 24.81 per hour

Texas

Today

Full-time

USD 32.84 - 67.88 per hour

Texas

Today

Full-time

USD 13.03 - 24.81 per hour

Search all similar jobs