Role Summary
We are seeking a Network SRE to ensure the reliability, scalability, and performance of cloud and hybrid network platforms.
This role applies SRE principles to networking by shifting from manual network operations to automated, observable, and resilient network services.
The ideal candidate is a network engineer who thinks like a software engineer and SRE.
Key Responsibilities:
Network Reliability Engineering
Design networks for:
High availability
Fault tolerance
Low latency
Predictable performance
Improve network reliability while reducing operational toil.
Cloud & Hybrid Networking
Architect and operate AWS networking:
VPCs, Subnets, Route Tables
Transit Gateway
NAT, IGW
PrivateLink, VPC Endpoints
Design hybrid connectivity:
Support multi-account and multi-region architectures.
Network Observability & Monitoring
Build deep network observability using:
VPC Flow Logs
CloudWatch
Datadog
Prometheus / Grafana
Analyze packet loss, latency, and throughput.
Implement proactive alerting based on SLOs.
Correlate network signals with application performance.
Automation & Infrastructure as Code
Automate network provisioning and changes using:
Implement CI/CD for network changes.
Reduce manual configuration and human error.
Version-control network definitions.
Incident Response & Troubleshooting
Lead network-related incident response.
Perform deep root-cause analysis for:
Participate in on-call rotation and post-incident reviews.
Drive permanent fixes rather than workarounds.
Security & Traffic Management
Design and enforce:
Implement secure ingress/egress patterns.
Support DDoS protection (AWS Shield, WAF).
Work with Security teams on audits and remediation.
Performance & Capacity Planning
Conduct traffic modeling and capacity forecasting.
Tune load balancers (ALB, NLB).
Optimize routing and failover strategies.
Validate resilience through failure testing.
Collaboration & Enablement
Partner with:
Cloud Platform teams
Application SREs
Security & Infra teams
Enable application teams with network best practices.
Produce architecture diagrams, runbooks, and SOPs.
Influence platform design decisions.
Required Skills & Qualifications
Must-Have:
- Strong networking fundamentals (TCP/IP, DNS, BGP, routing)
- AWS networking expertise
- SRE concepts & practices
- Network observability & monitoring
- Infrastructure as Code
- Production incident handling experience