Overview
On Site
USD 163,000.00 - 296,400.00 per year
Full Time
Skills
Accountability
Systems Architecture
ASA
Microsoft Azure
IaaS
D3.js
TCP
Statistics
NetFlow
Storage
Orchestration
OCP
OIF
Collaboration
Operational Excellence
Testing
CHAOS
Capacity Management
Documentation
Electrical Engineering
Computer Engineering
Mechanical Engineering
Data Link Layer
Network Layer
HPC
Ethernet
Remote Direct Memory Access
PFC
Border Gateway Protocol
OSPF
Load Balancing
ASIC
Scheduling
Optics
Ixia
TREX
Switches
Mentorship
Performance Management
Screening
PASS
Cloud Computing
Artificial Intelligence
Routing
Training
TSO
Management
QoS
Roadmaps
Network Design
Presentations
Shipping
Network
Python
Ansible
Terraform
Continuous Integration
Computer Hardware
Legal
Recruiting
Microsoft
SPARC
Job Details
Do you want to be at the forefront of innovating the latest hardware designs to propel Microsoft's cloud growth? Are you seeking a unique career opportunity that combines technical capabilities, cross-team collaboration with business insight and strategy?
Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees, we come together with a growth mindset, innovate to empower others, and collaborate to achieve our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.
Join the AI System Architecture (ASA) team within Microsoft's Azure Hardware Systems and Infrastructure (AHSI) organization, the team behind Microsoft's expanding Cloud Infrastructure and for powering Microsoft's "Intelligent Cloud" mission.
We are looking for a Principal Network Architect to join our team!
Responsibilities:
Qualifications:
Required/minimum qualifications
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: ;br>
Microsoft will accept applications for the role until October 24th, 2025.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form .
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
#AHSI #SPARC
Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees, we come together with a growth mindset, innovate to empower others, and collaborate to achieve our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.
Join the AI System Architecture (ASA) team within Microsoft's Azure Hardware Systems and Infrastructure (AHSI) organization, the team behind Microsoft's expanding Cloud Infrastructure and for powering Microsoft's "Intelligent Cloud" mission.
We are looking for a Principal Network Architect to join our team!
Responsibilities:
- Own end-to-end network architecture for AI training/inference clusters: topology, routing, transport, congestion control, QoS, telemetry, reliability, and failure domains.
- Lead and grow a high-performing team (~10 engineers) across architecture, performance, and validation; set goals, mentor, and drive execution.
- Define scale-out/scale-up designs (e.g., leaf-spine, dragonfly/dragonfly+, Clos/fat-tree, 2D/3D torus variants) and network services for job schedulers and accelerator runtimes.
- Drive congestion-control strategy (ECN/PFC, DCQCN, HPCC, TIMELY, HULL, adaptive load balancing like CONGA/HULA) and transport tuning (RDMA/RoCEv2, QUIC/TCP variants) for tail-latency and throughput SLAs.
- Hands-on analysis of switch/NIC behavior using counters, traces, and telemetry (PFC/ECN stats, INT, in-band telemetry, gNMI/gNOI, sFlow/NetFlow, eBPF); create reproducible perf tests.
- Evaluate and influence silicon & optics (ASIC feature roadmaps, queueing/scheduling, packet recirculation, shared buffer, VOQs, cut-through vs store-and-forward, 400/800G, linear vs retimed optics).
- Prototype and validate in lab and pre-prod: build testbeds, craft microbenchmarks and realistic AI workloads; automate with Python/Go/Ansible; codify SLOs and pass/fail gates.
- Partner across teams (accelerator/HBM, storage, orchestration, reliability) to co-design network-aware collective ops (all-reduce/all-to-all/mixture-of-experts) and placement policies.
- Influence standards and industry direction via active participation in IEEE 802.3/802.1, IETF, OCP, OIF, Ethernet Alliance, and vendor ecosystems; drive MSFT requirements into roadmaps.
- Operational excellence: define observability, fault isolation, failure testing (Jepsen-style chaos, link flap/black-hole, incast), capacity planning, and upgrade/rollout strategies.
- Documentation & reviews: author design docs, RFCs, and executive briefs; run design and readiness reviews.
Qualifications:
Required/minimum qualifications
- Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 9+ years technical engineering experience OR Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 11+ years technical engineering experience OR equivalent experience.
- 10+ years designing and operating large-scale L2/L3 Ethernet fabrics for HPC/AI or hyperscale services.
- 5+ years of experience with Ethernet, RDMA/RoCEv2, congestion control (ECN/PFC, DCQCN, HPCC, TIMELY), routing (BGP/ECMP, IS-IS/OSPF), and load balancing (CONGA/HULA/PLB).
- 5+ years of experience with of switch/NIC architecture (ASIC pipelines, queueing/scheduling, buffers, telemetry, hash/ECMP behaviors) and optics (DR/FR/LR, PAM-4, FEC).
- 5+ years of experience with traffic generation and analysis (ixia/Keysight, TRex, pktgen, iperf, perfetto), switch/NIC telemetry, and packet capture (INT, ERSPAN, SPAN, pcaps).
- 3+ years of experience managing engineers (hiring, mentoring, performance management, org health).
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to, the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
- Experience optimizing networks for AI collectives (all-reduce, all-gather, expert routing) and distributed training systems.
- Familiarity with programmable data planes (P4, eBPF/XDP), in-network telemetry/compute, and NIC offloads (GRO/TSO/LRO, DPDK).
- Depth in buffer management and queue disciplines (DWRR, WFQ, Deficit Round Robin, QCN, VOQ) and QoS for multi-tenant clusters.
- Experience with optic/PHY roadmaps (800G/1.6T, linear pluggables, CPO/LPO, FEC trade-offs) and DC power/cooling constraints affecting network design.
- Contributions to standards bodies/consortia (drafts, presentations) and vendor co-development.
- Proven track record shipping production network designs with measurable latency/throughput improvements and reliability gains.
- Proficiency in Python/Go and automation frameworks (Ansible/Terraform) for test, measurement, and CI.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: ;br>
Microsoft will accept applications for the role until October 24th, 2025.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form .
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
#AHSI #SPARC
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.