Overview
Skills
Job Details
Position: Sr. Network Observability Engineer (Tools, APM, NPM Engineer) Location: 100% Remote
About the Role:
The Senior Network Observability Engineer, Network Reliability Engineering (NRE) is the subject matter expert in designing and implementing the Network monitoring strategy and platforms for the next-gen operations for all networks including the Property Networks, Datacenter/Cloud Networks, and Corporate Networks. This role will work closely and collaboratively with a matrix team of expert network architects and engineers to drive adoption of NRE practices and operating models across all network product towers, and around globe. As the Senior Network Observability Engineer, the candidate brings focus on proactive monitoring, leveraging telemetry data to detect anomalies, identify potential issues before they impact users, and enable faster incident response. Reporting to senior leadership, the engineer will be responsible to articulate and effectively execute a vision to address increasing complexity and scale of modern IT systems and overcome traditional monitoring approaches limitations. The successful candidate will be expected to bridge the gap by implementing telemetry strategies and advanced monitoring techniques to understand system behavior at granular level. This role is accountable for establishing technical level relationships and partnering with all business disciplines, other teams, outsourcing vendors, and external suppliers to define and implement the Network NRE observability roadmap, building and enforcing the standards defined in it.
Education and Experience
Required:
Experience in collecting, processing, and monitoring telemetry data with a focus on analyzing, troubleshooting, and driving continuous improvements in mission critical networks.
Experience with network and application monitoring tools related products
Experience in installing, configuring, and troubleshooting of network and application monitoring tools (Solarwinds/Spectrum,BigPanda,NetScout, Thousand Eyes and AI/ML based network performance monitoring tools or other similar tools)
Must possess expertise in administration of devices and policies in Network Tools.
Knowledge and experience in NetFlow related configuration practices
Experience in developing, documenting, and managing the requirements gathering process and provide detailed design and implementation plan to support the requirements throughout the project life cycle
Field experience and knowledge of foundational data networking and IP technologies including (ARP, TCP/IP, UDP, DHCP, DNS, NAT and others)
Technical knowledge of common routing protocols (e.g., OSPF, BGP)
Experience in Agile methodologies, daily stand-up meetings, sprint planning sessions and user story preparations
Experience with common routing and switching platforms (Cisco, Juniper, HP/Aruba etc.)
Demonstrated experience in delivering written documents detailing network solutions and diagrams
Experience with one or more Cloud Computing platforms (e.g. Amazon AWS, Microsoft Azure, Google Compute Engine)
Preferred:
Advanced Degree (e.g., MS, PhD) in Computer Science or other technical discipline or MBA, preferably with a focus on technology
Experience with managing monitoring tools in a hospitality industry a plus
Experience in leveraging public APIs for developing automation scripts
Team player with the ability to collaborate and work with cross functional teams in multiple time zones
Experience in researching emerging technologies and trends, standards, and products and synthesizing into clear technology roadmaps and strategies
Strong knowledge of emerging tools, applications, and systems for attaining best-in-class network observability across the enterprise
Excellent verbal and written communication skills for a wide range of audiences including executives, business stakeholders, and IT teams