Job Description: | We are seeking a highly skilled Data Center Operations Engineer to join our Clients Network Engineering team. The successful candidate will be responsible for maintaining, troubleshooting, and optimizing the physical and logical infrastructure in our co-located data centers and campus locations, ensuring maximum availability, performance, and safety. This is a mission-critical, 247 role requiring both handson technical ability and strong coordination skills with vendors and IT teams. Responsibilities Infrastructure Maintenance & Operations - Perform preventative, corrective, and predictive maintenance on data center systems including UPS, generators, PDUs, cooling/HVAC, CRAC/CRAH units, chillers, and backup systems. Monitor performance and health of mechanical and electrical systems via various monitoring tools. Follows procedures to immediately communicate, report, and escalate data center technical or safety related incidents to management.
- Assist or lead hardware installations (racks, servers, network equipment, cabling) and commissioning of new systems. Directly performs decommissions, simple changes (e.g., memory upgrades, cabling, Firmware/OS rebuilds) and refreshes of infrastructure cabling, network, storage, and server equipment, following standard procedures. Validate and test new infrastructure components prior to production integration, ensuring grounding, staging, labeling, and cabling to align with all safety protocols, deployment standards, and planned designs and specifications.
- Coordinate thirdparty vendors, contractors, and service providers for maintenance, repairs, and deployment of new systems. Ensure vendor work complies with safety, quality, and standard operating procedures.
- Identify and propose improvements (automation, monitoring enhancements, energy efficiency, redundancy) to increase reliability and reduce operational costs. Participate in audits, compliance reviews, and safety inspections. Monitor, analyze, and report metrics to senior leadership.
- Work closely with IT, network, security, and application teams to support their infrastructure dependencies. Participate in oncall rotation and respond to offhours emergencies. Track budget and capital/operational expenditures.
Documentation & Procedures - Create and maintain SOPs, runbooks, configuration records, architecture diagrams, asset inventories, and change logs. Generate periodic reports on metrics such as uptime, capacity, and maintenance.
Risk, Change & Incident Management - Perform risk assessments, root cause analysis, and corrective actions for failures or incidents. Plan and execute change activities ensuring minimal disruption. Maintain clear escalation paths and communication during incidents.
Physical Requirements - Occasional climbing of ladders. Frequent climbing of stairs and/or ramps. Prolonged standing. Occasional lifting 50lbs / 22.5kg. Occasional push or pull 50-75 lbs / 22.5-34kg. with assistive device. Normal visual acuity (near, far and peripheral with correction), defined via standard medical terms and applicable criteria. Normal color vision for electrical work, defined via standard medical terms and applicable criteria.
Travel Requirements - This role has a travel requirement of up to 20%, which means you may be required to travel, from time to time, as part of this role.
|