Principal Site Reliability Engineer

Overview

On Site
USD 139,900.00 - 274,800.00 per year
Full Time

Skills

Collaboration
Innovation
Data Engineering
Data Integration
Big Data
Messaging
Business Intelligence
SQL Azure
Cosmos-Db
PostgreSQL
Analytics
GRID
Microsoft Power BI
Microsoft Azure
Artificial Intelligence
Operations Support Systems
Real-time
Recovery
Customer Facing
Customer Experience
Information Technology
Software Engineering
Network Engineering
System Administration
Screening
PASS
Cloud Computing
Computer Science
C
C++
C#
Java
JavaScript
Python
Management
Incident Management
Root Cause Analysis
Performance Tuning
Reliability Engineering
Integrated Circuit
IC
Internal Communications
Legal
Recruiting
Microsoft
Database

Job Details

Microsoft is a company where passionate innovators come to collaborate, envision what can be and take their careers further . This is a world of more possibilities , more innovation, more openness, and the sky is the limit thinking in a cloud-enabled world.

Microsoft's Azure Data engineering team is leading the transformation of analytics in the world of data with products like databases, data integration, big data analytics, messaging & real-time analytics, and business intelligence. The products our portfolio include Microsoft Fabric, Azure SQL DB, Azure Cosmos DB, Azure PostgreSQL, Azure Data Factory, Azure Synapse Analytics, Azure Service Bus, Azure Event Grid, and Power BI. Our mission is tobuild the data platform for the age of AI, powering a new class of data-first applications and driving a data culture.

Within Azure Data, the databases team builds and maintains Microsoft's operational Database systems. We store and manage data in a structured way to enable multitude of applications across various industries. We are on a journey to enable developer friendly, mission-critical, AI enabled operational Databases across relational, non-relational and OSS offerings.

We are looking for a Principal Site Reliability Engineer (SRE) who likes taking a data driven and systems-based approach to solve Service Reliability problems. You will be responsible for building and optimizing solutions that can analyze massive amounts of telemetry and other Service Health indicators in near real time and perform automated root cause analysis and necessary mitigations to restore SLO's.

We do not just value differences or different perspectives. We seek them out and invite them in so we can tap into the collective power of everyone in the company. As a result, our customers are better served .

Responsibilities:

  • Collaborating closely with engineering teams on building and enhancing tooling and automation solutions for faster resolution of issues impacting SLO's and averting incidents altogether when possible.
  • Collaborating with the customers to understand their pain points around Supportability and SLO attainment and formulate strategies for addressing recurring issues in a sustainable way.
  • Communicate on a deeply technical level and be the single point of contact for interfacing with large enterprise customers for handling service escalations and driving the issues to resolution.
  • Ability to design and implement any changes to service telemetry for the automation to consume if it is not already available.
  • Enhancing customer facing experience by proactive alerting based on utilization , trends, resource health, etc.
  • Analyze data and provide operational insights into customer experience to Design and Product teams, so that we can design features with Supportability in mind.
Embody our culture and values

Qualifications:

Required /Minimum Qualifications
  • 8+ years technical experience in software engineering, network engineering, or systems administration
    • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration
    • OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
    • OR Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
  • Experience with distributed systems and databases.
Other Requirements

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check:
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Preferred/Additional Qualifications
  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
    • OR Bachelor's Degree in Computer Science or related technical field AND 10+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
    • OR equivalent experience.
  • Hands-on experience managing live-site operations, leading incident response for distributed systems like Database systems, driving root cause analysis through detailed post-mortems to improve system reliability and performance tuning.
Site Reliability Engineering IC5 - The typical base pay range for this role across the U.S. is USD $139,900 - $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: ;br>
Microsoft will accept applications for the role until June 16th, 2025.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form .

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.

#azdat

#azuredata

#sre #performance #database #reliability
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.