Job Description: Compute Operations Manager (VMware, Windows & Linux)
Role Summary
The Compute Operations Manager is responsible for managing enterprise compute infrastructure across VMware, Windows, and Linux platforms, ensuring high availability, security, and operational excellence. This is a client-facing leadership role requiring strong technical depth, delivery governance, and stakeholder management skills in a managed services or COE environment.
Key Responsibilities
1. End-to-End Compute Operations
- Lead operations for:
- VMware (vSphere, ESXi, vCenter)
- Windows Server environments
- Linux platforms (RHEL, CentOS, Ubuntu)
- Ensure 24x7 availability, performance, and stability of compute services
- Drive SLA adherence, uptime targets, and operational KPIs
2. Client & Stakeholder Management
- Act as the primary customer-facing contact for compute services
- Lead:
- Service review meetings (weekly/monthly governance)
- Incident reviews and RCA discussions
- Continuous improvement initiatives
- Translate business requirements into technical solutions
- Maintain strong client relationships and drive customer satisfaction (CSAT)
3. Team Leadership & Delivery
- Manage and mentor multi-skilled teams (VMware + Windows + Linux admins)
- Oversee L2/L3 support teams in global delivery model (onshore/offshore)
- Drive:
- Skill development and cross-training
- Resource planning and capacity management
- Ensure adherence to operational processes and service quality standards
4. Incident, Problem & Change Management
- Lead resolution of critical incidents (P1/P2) across platforms
- Ensure timely RCA and preventive actions
- Govern change management activities (patching, upgrades, migrations)
- Maintain and improve operational runbooks and SOPs\
5. Platform Management & Engineering
- VMware:
- Cluster, host, and VM lifecycle management
- HA/DR (vMotion, DRS, SRM basics)
- Windows:
- Server administration, patching, Active Directory coordination
- Linux:
- OS administration, patching, performance tuning
- Collaborate with:
- Storage, network, security, and cloud teams
6. Automation & Transformation
- Drive automation using:
- PowerCLI, PowerShell, Python, Shell scripting
- VMware Aria / vRealize
- Implement:
- Self-service provisioning
- Auto-remediation and monitoring optimization
- Reduce operational effort and improve efficiency
7. Monitoring, Reporting & Governance
- Monitor infrastructure using enterprise tools (vROps, OEM, etc.)
- Publish dashboards and reports:
- SLA/KPI metrics
- Capacity & utilization
- Incident trends
- Ensure compliance with:
- Security policies
- Audit and regulatory controls
Required Skills & Expertise
Technical Skills
- Strong expertise in:
- VMware vSphere (ESXi, vCenter, clusters, HA/DRS)
- Windows Server (2012/2016/2019/2022)
- Linux (RHEL / Ubuntu / CentOS administration)
- Knowledge of:
- Backup & DR tools
- Storage (SAN/NAS) and networking fundamentals
Automation & Tools
- Scripting:
- PowerShell / PowerCLI (mandatory)
- Python / Bash (preferred)
- ITSM tools:
- ServiceNow (or equivalent)
- Monitoring:
- vROps, SCOM, SolarWinds, or similar
Process & Framework
- Strong understanding of ITIL processes
- Experience in managed services / COE delivery model
- Exposure to SLA-driven environments
Leadership & Soft Skills
- Strong communication and presentation skills
- Proven ability to manage client expectations and escalations
- Analytical problem-solving and decision-making
- Ability to work in high-pressure environments
Experience Required
- 10 15 years of IT infrastructure experience
- 5+ years in compute/platform operations leadership
- Experience managing VMware + Windows + Linux environments
- Prior experience in client-facing managed services roles (mandatory)
Education & Certifications
- Bachelor s degree in IT / Computer Science or equivalent
- Preferred certifications:
- VMware VCP
- Microsoft (Windows Server / Azure Admin optional)
- Red Hat (RHCSA/RHCE preferred)
- ITIL Foundation
Key KPIs
- SLA compliance (Availability, MTTR)
- Incident reduction & RCA effectiveness
- Automation adoption (reduced manual effort)
- Customer satisfaction (CSAT)
- Platform performance and utilization optimization
Nice to Have
- VMware NSX / Horizon knowledge
- Cloud exposure (Azure / AWS VMware)
- Experience in transformation / migration projects