Datacenter Operation Engineer - 3+yrs - Atlanta GA - Onsite

Overview

On Site
Depends on Experience
Accepts corp to corp applications
Contract - Independent
Contract - 6 month(s)
No Travel Required

Skills

Java
configuration
and subnet management on the IB switch
including the physical layout of equipment
ARP
ICMP
TCP
UDP
SMTP
FTP
troubleshooting
installing
Delivery
configuring
Migration
This role entails assisting with all projects and repairs within the data center
participating in an on-call rotation
and providing hands-on coverage during maintenance. The selected individual will be responsible for handling a variety of tasks
including solving operational issues
analyzing and designing operations to improve workflow
managing equipment layout
and ensuring accident prevention. They will support operations
customer deployments
and ensuring the timely bring-up of GPU servers. Additionally
they will manage InfiniBand fabric bring-up
and will document existing operational processes and equipment. The candidate should utilize a framework for monitoring tools
escalate key issues
and ensure timely service implementation. They will be diagnosing
and repairing all software
hardware
and components. Furthermore
they should be proficient in installing
and troubleshooting networking equipment like routers and switches
and have a good understanding of the OSI Model and TCP/IP protocol suite (IP
TFTP). Configuring Terminal Servers for out-of-band management
managing daily issues including health checks of servers and processes
and working closely with end-users
development teams
and Infrastructure teams to prioritize
resolve
and mitigate outages are also part of the responsibilities. The role also involves server installation and maintenance
network installation and maintenance
site builds and refreshes while meeting current quality standards
and interacting with onsite staff and vendors for hardware replacement
and diagnostics. Additionally
the candidate will perform operational tasks associated with data center implementation
deployments

Job Details

Job Description:

This role entails assisting with all projects and repairs within the data center, participating in an on-call rotation, and providing hands-on coverage during maintenance. The selected individual will be responsible for handling a variety of tasks, including solving operational issues, analyzing and designing operations to improve workflow, managing equipment layout, and ensuring accident prevention. They will support operations, including the physical layout of equipment, customer deployments, and ensuring the timely bring-up of GPU servers. Additionally, they will manage InfiniBand fabric bring-up, configuration, and subnet management on the IB switch, and will document existing operational processes and equipment.

The candidate should utilize a framework for monitoring tools, escalate key issues, and ensure timely service implementation. They will be diagnosing, troubleshooting, installing, and repairing all software, hardware, and components. Furthermore, they should be proficient in installing, configuring, and troubleshooting networking equipment like routers and switches, and have a good understanding of the OSI Model and TCP/IP protocol suite (IP, ARP, ICMP, TCP, UDP, SMTP, FTP, TFTP). Configuring Terminal Servers for out-of-band management, managing daily issues including health checks of servers and processes, and working closely with end-users, development teams, and Infrastructure teams to prioritize, resolve, and mitigate outages are also part of the responsibilities.

The role also involves server installation and maintenance, network installation and maintenance, site builds and refreshes while meeting current quality standards, and interacting with onsite staff and vendors for hardware replacement, delivery, and diagnostics. Additionally, the candidate will perform operational tasks associated with data center implementation, migration, deployments, cabling, and rack and stack.

As for the requirements, the candidate should have experience with cluster bring-up, drivers, loading, and GPU end-to-end testing in a cluster with InfiniBand. They should also have experience with the setup of GPU servers in a cluster, proficiency in Linux environments, and tasks such as shell scripting. Strong skills in installation, configuration, and troubleshooting of Linux operating systems, experience in OpenStack cloud operations, and excellent data center organization skills with meticulous attention to detail are also required. Familiarity with fiber and copper network cabling, including IP and SAN deployments, and the ability to maintain acceptable ticket loads and incident SLAs, follow documented escalation procedures, and sync with global teams on various tasks and upcoming initiatives are essential.

Understanding and adhering to documented policies, processes, and procedures, assisting with process improvement initiatives, and documentation of policies, processes, and procedures, including runbooks, are also crucial. The candidate should be able to move 50+ pounds as well

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About iMedhas Consulting Services