Job Summary
We are seeking a dedicated, detail-oriented IT Server Operations Analyst to join our IT Server Operations team within the Network Operations Center (NOC). This role is designed for someone who thrives in a structured, fast-paced operational environment and can balance day-to-day server support responsibilities with growing involvement in cloud operations, automation, and observability.
This position remains rooted in NOC-based server operations: monitoring, alerting, incident response, shift handoffs, failovers and failbacks, DNS updates, maintenance window execution, and support ticket coordination. At the same time, the role will partner closely with the broader Server Operations team on automation initiatives, cloud onboarding support, and observability workflows across on-prem, AWS, and Azure environments.
The ideal candidate brings hands-on experience in Windows and Linux server support, NOC operations, troubleshooting, and infrastructure support, while also offering working knowledge of AWS, Azure, cloud-native monitoring tools, and observability platforms. Success in this role means keeping systems stable, improving operational efficiency, communicating clearly across teams, and helping modernize support processes through better tooling, automation, and cloud awareness.
Must have
Clear, Actionable Communication – Demonstrated ability to communicate operational status, incident impact, and resolution steps across technical and business stakeholders, especially during shift handoffs and escalations.
Infrastructure Monitoring and strong familiarity with tools like SolarWinds Orion, Dynatrace, or equivalent platforms to monitor system health and suppress noise during maintenance windows
Reliability and Shift Discipline – Consistent performance during assigned shifts, including punctuality, accountability, and participation in on-call rotations.
Server Troubleshooting Skills – Ability to diagnose and resolve issues across Windows and Linux environments.
Nice To Have
Basic Scripting Awareness – Familiarity with PowerShell or Bash for simple automation or log parsing tasks (not required, but helpful for efficiency).
Certifications Industry credentials such as CompTIA Server+, Microsoft Certified: Azure Administrator, or Red Hat Certified System Administrator.
ITSM and Ticketing - Experience with Helix Remedy or similar platforms for incident tracking and change control.
Virtualization Experience – Exposure to VMware or Hyper-V environments for server provisioning and troubleshooting.
Key Responsibilities
Monitoring, Alerting, and Incident Response
· Monitor infrastructure and platform health using tools such as SolarWinds Orion, Dynatrace, AWS CloudWatch, Azure Monitor, or similar platforms.
· Respond to alerts with urgency and precision, triage issues, escalate appropriately, and drive resolution within SLA targets.
· Log incidents thoroughly, capturing actions taken, outcomes, and escalation flow to support RCA, post-mortems, and knowledge sharing.
· Participate in a structured on-call rotation for after-hours support.
· Execute maintenance window tasks by validating application checkouts, confirming maintenance mode, and suppressing alert noise according to implementation schedules.
Server Operations, DNS and Platform Tools
· Perform hands-on diagnostics and remediation for Windows and Linux servers in physical, virtual, and hybrid environments.
· Support routine maintenance activities including server health checks, troubleshooting, patching coordination, firmware updates, and operational validation.
· Create, manage, and update support tickets and coordinate on-site or cross-team follow-up as needed.
· Maintain accurate documentation of configurations, operational standards, and support procedures.
· Support failover and failback activities and ensure proper validation and documentation.
· Work with native cloud monitoring services such as AWS CloudWatch and Azure Monitor for baseline visibility, alerting, and reporting.
· Collaborate with observability-focused team members on deeper monitoring use cases, topology visibility, and application-aware troubleshooting.
· Create and update DNS records in Infoblox to support infrastructure changes, failovers, application status changes, and forwarding-zone needs.
Automation and Process Improvement
· Partner with the team on automation initiatives that reduce manual effort, improve alert quality, and strengthen operational consistency.
· Use or contribute to automation and scripting efforts involving tools such as PowerShell, Bash, Python, Ansible, or workflow-based tooling as appropriate.
· Help identify repetitive operational tasks that should be standardized through runbooks, playbooks, workflow cleanup, or automation.
· Support improvements in areas such as monitoring onboarding, maintenance window workflows, ticket routing, and operational documentation.
Reporting, Communication, and Engagement
· Provide clear, concise updates during shift handoffs, operational reviews, and incident communications.
· Collaborate with cross-functional teams to align on incident priorities, escalation paths, service impact, and operational ownership.
· Partner with stakeholders to understand key performance metrics and improve reporting, dashboards, and actionable alerting.
· Track operational metrics and highlight risk, recurring issues, and opportunities for service improvement.
Required Qualifications
· 5+ years of hands-on experience in server operations, or infrastructure support roles.
· Strong working knowledge of Windows Server and Linux environments.
· Experience with infrastructure monitoring, alerting, and incident response workflows.
· Familiarity with SolarWinds Orion, Dynatrace, or similar monitoring and observability platforms.
· Working knowledge of AWS and/or Azure, including basic familiarity with CloudWatch and/or Azure Monitor.
· Familiarity with data center operations, hardware support, and physical/virtual infrastructure troubleshooting.
· Solid understanding of networking fundamentals (TCP/IP, DNS, DHCP).
· Excellent troubleshooting, documentation, and communication skills.
· Comfortable working shifts, participating in on-call rotation, and operating from the NOC environment.
Preferred Qualifications
· Experience supporting hybrid environments across on-prem, AWS, and Azure.
· Familiarity with AWS CloudWatch, Azure Monitor as part of an enterprise observability model.
· Experience with PowerShell, Bash, Python, or Ansible for operational scripting or workflow automation.
· Exposure to virtualization technologies such as VMware or Hyper-V.
· Familiarity with ITSM and ticketing tools such as Helix Remedy.
· Experience with Infoblox or enterprise DNS administration.
· Experience contributing to runbooks, playbooks, automation workflows, or operational onboarding processes.
· Relevant certifications such as Microsoft, Red Hat, AWS, Azure, CompTIA Server+, or similar.
Top Daily Tasks
· Manage and tune SolarWinds Orion and Dynatrace alerting for actionable signal quality and maintenance-window noise suppression.
· Triage incoming alerts, email notifications, and server related issues; escalate critical events and maintain stakeholder awareness.
· Perform and validate application failovers and failbacks and document actions taken.
· Provide responsive phone and ticket support for infrastructure incidents and service requests.
· Create and update DNS entries in Infoblox.
· Review dashboards and monitor views across Orion, Dynatrace, CloudWatch, and Azure Monitor as applicable to supported workloads.
· Participate in shift handoffs, queue management, and runbook-driven operational follow-up.
· Identify repetitive operational work suitable for scripting, workflow cleanup, or automation collaboration.