Overview
Skills
Job Details
Job Description:
• Bachelor’s degree in Computer Science, Information Technology, Electrical Engineering, or a related field. Advanced degrees or relevant professional training are a plus.
• Minimum 10 years of experience in data center operations, with at least 5 years in a leadership or senior technical role.
• Extensive experience in data center operations, with a proven track record of managing large-scale data center environments.
• Strong leadership and team management skills, with the ability to motivate and develop a high-performing operations team.
• In-depth knowledge of data center infrastructure, including servers, storage, networking, power, and cooling systems.
• Excellent problem-solving and analytical skills, with the ability to diagnose and resolve complex technical issues.
• Experience with incident and problem management, change management, and capacity planning.
• Lead the data center operations team, providing guidance, training, and support to ensure high performance and operational excellence. Act as the primary point of contact for all data center-related issues and escalations.
• Oversee the daily operations of data center facilities, ensuring high availability and reliability of all systems.
• Manage data center infrastructure technology stack end to end – VMWare/VxRail/Citrix/Logic Monitor/Moog Soft/AD/Azure AD SSO, Azure Security Policy/PKI/Windows & Linux Servers/Vulnerability management/Beyond Trust Password Safe and AD-Bridge/Storage & Backup tools etc.
• Ensure adherence to operational standards and best practices.
• Drive the major incidents and potential incidents end to end with periodic updates to client stake holders for approvals/recommendations.
• Lead, mentor, and manage a team of data center operation engineers.
• Provide guidance and support for professional development and performance improvement.
• Coordinate and manage the team's daily activities, ensuring alignment with organizational goals and priorities.
• Lead the response to data center incidents, ensuring timely resolution and minimal impact on business operations.