Overview
On Site
USD 119,800.00 - 234,700.00 per year
Full Time
Skills
GPU
Operational Excellence
Release Engineering
Performance Tuning
Capacity Management
SAFE
Testing
Disaster Recovery
Management
Root Cause Analysis
Computer Hardware
Firmware
Predictive Analytics
Computer Science
Information Technology
Network Engineering
System Administration
Software Engineering
Programming Languages
C#
Python
IaaS
Computer Networking
Systems Design
Incident Management
DevOps
Conflict Resolution
Problem Solving
Debugging
Screening
PASS
Reliability Engineering
IC
Internal Communications
Integrated Circuit
Legal
Recruiting
Microsoft
Microsoft Azure
Cloud Computing
Job Details
The Firmware Deployment team within Microsoft's Silicon Cloud Hardware Infrastructure Engineering ( S CHIE) organization is responsible for building and operating world-class software and data-driven services that support Azure's hardware infrastructure development.Our mission is to enable safe, reliable, and intelligent deployment of firmware payloads across the Azure fleet, ensuring system health and operational quality at scale.
We are seeking a Site Reliability Engineer within the Firmware Deployment team, you will be instrumental in shaping the future of the Azure Fleet. Your primary responsibility will involve developing and applying stable firmware releases across the GPU fleet, as well as potentially supporting other related environments. This work is essential to maintain Microsoft's security and performance standards while delivering an outstanding experience for our customers.
Your efforts in deploying and managing firmware updates will ensure the reliability and efficiency of Azure's hardware infrastructure. By focusing on stability and operational excellence, you will help safeguard system health and contribute to the ongoing success and growth of Azure's global infrastructure.
Responsibilities:
Qualifications:
Required/minimum qualifications:
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
Microsoft Cloud Background Check:
This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Site Reliability Engineering IC4 - The typical base pay range for this role across the U.S. is USD $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: ;br>
Microsoft will accept applications for the role until October 20th, 2025
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form .
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
# S CHIE # AZURE # Cloud
We are seeking a Site Reliability Engineer within the Firmware Deployment team, you will be instrumental in shaping the future of the Azure Fleet. Your primary responsibility will involve developing and applying stable firmware releases across the GPU fleet, as well as potentially supporting other related environments. This work is essential to maintain Microsoft's security and performance standards while delivering an outstanding experience for our customers.
Your efforts in deploying and managing firmware updates will ensure the reliability and efficiency of Azure's hardware infrastructure. By focusing on stability and operational excellence, you will help safeguard system health and contribute to the ongoing success and growth of Azure's global infrastructure.
Responsibilities:
- Build and bring specializedknowledge across multiple production aspects (monitoring, release engineering, testing, live site excellence, buildout, performance optimization, capacity management)
- Analyze large-scale telemetry and operational data to uncover insights and drive data-informed decisions.
- Use the proven set of principles and practices such as safe deployment, testing for reliability, single point of failures elimination, disaster recovery, SLOs based monitoring, throttling, infrastructure management automation, post-mortem excellence, and adoption of common systems
- Respond to alerts and incidents .
- Build and follow playbooks to drive root cause analysis and reviews
- Partner with hardware and firmware teams to understand system behavior and identify opportunities for predictive analytics.
- Participate in an on-call rotation and availability during non-standard business hours and contribute to service reliability and incident resolution.
Qualifications:
Required/minimum qualifications:
- Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration
- OR equivalent experience.
- 3+ years of experience in software engineering or operations for large-scale distributed systems.
- Ability to support a 24x7 data center environment, including participation in an on-call rotation and availability during non-standard business hours(evening, nights, weekends, or holidays) as operational needs require.
- Proficiency in one or more programming languages (C#, Python, Go, or similar).
- Understanding of cloud infrastructure (Azure preferred), networking, and system design.
- Familiarity with monitoring tools, incident management frameworks, and DevOps practices.
- Problem-solving and debugging skills.
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
Microsoft Cloud Background Check:
This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Site Reliability Engineering IC4 - The typical base pay range for this role across the U.S. is USD $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: ;br>
Microsoft will accept applications for the role until October 20th, 2025
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form .
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
# S CHIE # AZURE # Cloud
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.