Job DescriptionNordstrom Technology seeks an exceptional
Site Reliability Engineer with deep networking expertise to join our Nordstrom Operations Center (NOC) team. You'll maintain "eyes on glass" monitoring of application services and critical network infrastructure, ensuring the health and reliability of Nordstrom's retail operations. This role combines proactive monitoring, incident response, and root cause analysis with advanced network troubleshooting-diagnosing complex issues spanning the full stack and driving resolution of P1/P2 incidents that impact business operations.
This role is offered as onsite in Seattle, WA supporting Nordstrom's 24/7 NOC. Candidates must be available to work in office at the Nordstrom corporate headquarters 5 days/week with shifts starting at 6:00 AM PST, including one weekend day per week (Saturday or Sunday) as part of regular rotation. A day in the life... Monitoring & Incident Response: - Maintain real-time monitoring across application services, network infrastructure, and business KPIs (site visitors, order flow, revenue-impacting metrics)
- Participate in 24/7 on-call rotations, responding to PagerDuty alerts and managing incidents through ServiceNow workflows
- Lead P1/P2 incident troubleshooting, coordinating with engineering teams and vendors to restore service rapidly
- Perform real-time network diagnostics and performance testing during active incidents
Network Operations: - Monitor and troubleshoot routers, switches, firewalls, load balancers, wireless systems, and SD-WAN solutions
- Analyze network performance, identify bottlenecks, and recommend optimization strategies
- Investigate connectivity issues, VLAN configurations, routing problems, and security events
- Coordinate with network engineering during changes, maintenance windows, and infrastructure upgrades
- Maintain visibility into multi-vendor cloud environments (AWS, Azure) and cloud networking architectures
Root Cause Analysis & Continuous Improvement: - Conduct deep technical investigations focusing on credential expirations, service account failures, authentication incidents, and cascading failures
- Document findings in detailed RCA reports with actionable remediation steps
- Build and refine monitoring dashboards to improve Mean Time to Detect (MTTD) and Mean Time to Mitigate (MTTM)
AI-Driven Operations & Automation: - Contribute to AI-driven incident detection and automated response initiatives, building autonomous monitoring and remediation capabilities
- Develop scripts and automation to remediate common incidents, reduce manual toil, and accelerate response workflows
- Create automated health checks and build integrations between monitoring platforms (New Relic, PagerDuty, ServiceNow, Jira)
Observability & Reliability: - Enhance monitoring, logging, and alerting using New Relic or similar platforms
- Track operational metrics (MTTD, MTTM, incident trends) and build executive-level dashboards
- Support SLO/SLI definition and tracking for critical services and network infrastructure
- Collaborate with teams to improve fault tolerance, redundancy, and disaster recovery
Collaboration & Leadership: - Work closely with software engineering, infrastructure, and network teams to improve operational readiness
- Communicate effectively with stakeholders at all levels during incidents and post-incident reviews
- Contribute to NOC optimization including shift scheduling and process improvements
You own this if you have... Required Technical Skills: Networking Expertise: - Strong understanding of TCP/IP, OSI model, routing protocols (BGP, OSPF), and switching technologies
- Experience troubleshooting network connectivity, packet loss, latency, and performance issues
- Proficiency with network monitoring tools and packet analysis (Wireshark, tcpdump, NetFlow/sFlow)
- Knowledge of DNS, DHCP, VLANs, VPNs, firewalls, load balancers, and network security
- Hands-on experience with MIST or Aruba wireless management or similar enterprise wireless platforms
- Deep understanding with Juniper Networks routing and switching platforms
SRE & Infrastructure: - 1-3+ years in site reliability engineering, NOC operations, or similar roles (flexible based on networking depth)
- Proficiency with New Relic or similar enterprise monitoring platforms
- Strong cloud platform experience (AWS, Azure) and cloud networking concepts
- Hands-on containerization and orchestration experience (Docker, Kubernetes/NSK)
- Familiarity with Kafka streaming platforms and CI/CD pipelines
Programming & Automation: - Proficiency in Python, Go, Bash, or PowerShell for automation and troubleshooting
- Experience with REST APIs and system integrations
Operational Excellence: - Proven track record managing P1/P2 incidents in 24/7 production environments
- Experience with PagerDuty, ServiceNow, and Jira
- Strong analytical skills diagnosing complex, multi-layered technical issues under pressure
- Root cause analysis experience with detailed technical documentation
Preferred Qualifications: - Bachelor's degree in computer science, engineering, networking, or equivalent degree
- Network or Security certifications (CCNA/CCNP, PCNSA/PCNSE, or equivalent)
- Retail/e-commerce experience with POS systems understanding
- Interest in AI/ML applications in operations, anomaly detection, or automated incident response
- Experience with vendor management and multi-vendor incident coordination
- Security operations and incident response knowledge
Core Competencies: - Technical depth across application, infrastructure, and network layers
- Communication excellence for incident coordination, documentation, and stakeholder updates
- Pressure management with calm, methodical approach to high-pressure situations
- Proactive mindset identifying problems before they become incidents
- Collaboration working effectively across technical and business organizations
- Innovation driving automation and continuous improvement
We've got you covered...Our employees are our most important asset and that's reflected in our benefits. Nordstrom is proud to offer a variety of benefits to support employees and their families, including:
- Medical/Vision, Dental, Retirement and Paid Time Away
- Life Insurance and Disability
- Merchandise Discount and EAP Resources
A few more important points...
The job posting highlights the most critical responsibilities and requirements of the job. It's not all-inclusive. There may be additional duties, responsibilities and qualifications for this job.
For Los Angeles or San Francisco applicants: Nordstrom is required to inform you that we conduct background checks after conditional offer and consider qualified applicants with criminal histories in a manner consistent with legal requirements per Los Angeles, Cal. Muni. Code 189.04 and the San Francisco Fair Chance Ordinance. For additional state and location specific notices, please refer to the Legal Notices document within the FAQ section of the Nordstrom Careers site.
Applicants with disabilities who require assistance or accommodation should contact the nearest Nordstrom location, which can be identified at
Please be mindful that there may be legal notices and requirements related to this job posting that are specific to your state. Review the Career Site FAQ's for relevant information and guidelines.
2022 Nordstrom, Inc
Current Nordstrom employees: To apply, log into Workday, click the Careers button and then click Find Jobs.
Applications are accepted on an ongoing basis.
Pay Range DetailsThe pay range(s) below has been provided in compliance with state specific laws. Pay ranges may be different for other locations.
Pay offers are dependent on the location, as well as job-related knowledge, skills, and experience.
$104,500.00 - $162,500.00 Annual
This position may be eligible for performance-based incentives/bonuses. Benefits include 401k, medical/vision/dental/life/disability insurance options, PTO accruals, Holidays, and more. Eligibility requirements may apply based on location, job level, classification, and length of employment. Learn more in the Nordstrom Benefits Overview by copying and pasting the following URL into your browser: _Overview_15_Full_Time_ES-US.pdf