Role: Senior Network Systems Operation Engineer
Location: Herndon, VA
Key Roles and Responsibilities:
Fault Management:
- Provide technical support and troubleshooting for network platforms, including alarm and KPI monitoring (proactive/reactive).
- Investigate and diagnose network issues, triage and communicate event status, coordinate root cause analysis, and implement service restoration.
- Remediate complex technology faults across multiple production environments, including vendor/IT fixes or design changes.
- Proactively monitor and maintain system configurations to achieve optimal performance and reliability using KPIs.
- Install, configure, and patch application systems (RHEL and Windows VMs in VMware vSphere).
- Support a range of applications such as MySQL, PHP, Cisco ISE, Crystal Reports, NetBrain, ITSM Service Management and Ticketing platforms such as Remedy, and ServiceNow, Network-event correlation focused monitoring tools such as ScienceLogic (SL1), Splunk etc.
- Develop and maintain policies (including event and trap/SNMP policies) within Event Management platform to enable effective fault detection, correlation, and escalation.
- Develop scripts and automation to proactively identify issues, support rapid remediation, and ensure uninterrupted operations.
Change Management and Automation:
- Collaborate with internal teams and external partners (vendors, Engineering, CTO, Product) to design and implement custom solutions.
- Execute change management processes for platform enhancements, configuration changes, and upgrades.
- Implement automation using scripting languages (Perl, PHP, Python, Shell) to improve supportability, operational workflows, and monitoring capabilities.
- Analyze platform and operational metrics using standard analytics and data visualization/reporting tools.
Compliance and Security:
- Perform security patching, system updates, and deploy application change requests (e.g., MRs) as required.
- Partner with security teams to ensure devices meet applicable compliance standards and regulatory requirements (e.g., FISMA).
- Maintain an accurate, up-to-date inventory of company-owned assets, including servers, routers, switches, and desktops.
- Manage and govern router and switch configuration management activities.
Subject Matter Expertise and Support:
- Serve as a subject matter expert (SME) for Linux, Oracle, and Windows-based applications and platforms.
- Support internal development teams by advising on system design, integration, and technology solution options.
- Provide end-user support, including on-site assistance for ticketing tools and troubleshooting user-impacting issues.
- Participate in a 24/7 on-call rotation to support critical IT and production environments.
Infrastructure and Reporting:
- Maintain NOC infrastructure and ensure reliable, consistent operations across multiple data centers.
- Lead performance reporting, manage end-of-life tool replacements, and produce reports for Program Management, NOC leadership and other stakeholders.
- Administer and update asset and circuit/program databases, ensuring accurate inventory tracking and proper ticketing throughout the asset lifecycle.
Education: Bachelor s degree in Math, Science, Engineering, Computer Sciences, or Operations preferred.
Experience: Typically requires 8-10 years of relevant experience. (Technical Career Pathway role.)
Qualifications:
- Eligible for Public Trust suitability
Skills and Competencies:
- Demonstrated expert-level troubleshooting and technical support capabilities across complex network, compute, and platform technologies, with a strong ability to isolate root cause, assess impact, and drive issues to resolution.
- Hands-on experience administering and supporting RHEL, Windows Server, and virtualized environments using VMware vSphere, along with exposure to a wide range of enterprise applications and integrated services.
- Strong scripting and automation skills (Perl, PHP, Python, Bash/Shell), including building repeatable operational tools to streamline deployments, monitoring, health checks, and incident response.
- Excellent communication, collaboration, and customer service skills able to translate technical details for varied audiences, coordinate effectively with cross-functional teams, and provide clear updates to stakeholders.
- Proven ability to manage competing priorities in a 24x7 operations setting, staying organized under pressure while meeting SLAs, following change/control processes, and maintaining high service reliability.
Note: This position requires participation in an on-call rotation for 24/7 support.
--
Asher Williams
Desk: 2o1.497.1o1o X:1o5 | Direct: 551.272.o129
asher (at) pullskill dot com