Job Title Requirements:
• Hands-on infrastructure operations engineer with strong monitoring/alerting administration, incident triage, and hybrid infrastructure support (servers, virtualization, storage, networking, and cloud).
• Experience with ITSM processes (incident/problem/change) and operational documentation/runbooks. Comfortable working across teams and vendors in a service-oriented environment.
Typical Software Used for Engagement:
• Monitoring/observability tools (e.g., SolarWinds, Splunk, Datadog, Dynatrace, Nagios or similar); ITSM/ticketing; virtualization management (e.g., vCenter); server/storage/network administration tools; cloud portals (Azure/AWS/Google Cloud Platform); scripting/automation tools (PowerShell, Python) and configuration management (Ansible/Terraform as applicable).
Mandatory Qualifications:
• 5+ years supporting enterprise IT infrastructure environments (servers, storage, networking, virtualization, cloud).
• 3+ years supporting enterprise monitoring, alerting, and operational management tools in complex environments.
• Strong understanding of Windows and Linux server operations; virtualization platforms; networking protocols; storage systems; cloud services.
• Experience supporting hybrid/cloud environments (IaaS/PaaS/SaaS) such as Azure, AWS, or Google Cloud.
• Familiarity with observability platforms (SolarWinds, Splunk, Datadog, Dynatrace, Nagios, or similar).
• Familiarity with scripting/automation and configuration management (PowerShell, Python, Ansible, Terraform).
• Strong troubleshooting, communication, and documentation skills; ability to work independently and in cross-functional teams.
• Willingness to participate in after-hours support as required.
Duties/Responsibilities:
• Monitor enterprise infrastructure environments (servers, networks, storage, virtualization, cloud) to ensure reliability, availability, and performance.
• Configure and maintain monitoring/alerting dashboards, alerts, and reporting to detect anomalies and potential service disruptions.
• Respond to alerts and incidents by investigating issues and coordinating resolution with infrastructure, application, and cybersecurity teams.
• Assist in design, implementation, and maintenance of infrastructure solutions across on-premises, hybrid, and cloud environments.
• Support deployment, configuration, and lifecycle management of servers, virtualization platforms, storage systems, networking components, and cloud services.
• Participate in upgrades, patching, maintenance, and performance optimization.
• Support public cloud and hybrid environments, including IaaS/PaaS/SaaS operational practices.
• Assist with security controls, vulnerability remediation, and compliance with cybersecurity standards.
• Support data protection (backup/replication) and DR configurations; participate in DR testing and resilience planning.
• Improve operational processes through automation, scripting, configuration management, and documentation/runbooks.
• Participate in ITSM processes (incident/problem/change) and maintain accurate documentation and system configurations.
• Collaborate with internal teams and vendors; support special projects and travel/on-call participation as needed.