Software Engineering Manager, Triage Services and Infrastructure

Cupertino, CA, US • Posted 1 day ago • Updated 40 minutes ago
Full Time
On-site
Fitment

Dice Job Match Score™

🎯 Assessing qualifications...

Job Details

Skills

  • Software Engineering
  • Reliability Engineering
  • OS X
  • IOS Development
  • Mentorship
  • Root Cause Analysis
  • Dashboard
  • Microsoft SSAS
  • Collaboration
  • Computer Hardware
  • Data Quality
  • Debugging
  • Workflow
  • Communication
  • Technical Direction
  • Management
  • Operating Systems
  • Computer Science
  • Electrical Engineering
  • Artificial Intelligence
  • Machine Learning (ML)

Summary

The Core OS team is seeking an exceptional engineering manager to lead the team responsible for enabling Apple's operating systems to achieve world-class reliability. This team develops and owns mission-critical tools and services that detect, analyze, and classify kernel panics and low-level crashes across all Apple platforms. You will be partnering with engineering teams across Software, Hardware, and Silicon groups to drive and deliver the rock-solid OS reliability for over 2 billion currently active Apple devices and shape the future of system reliability across Apple's entire product ecosystem.

Lead a team of engineers triaging kernel panics and critical system-level issues across all Apple platforms (macOS, iOS, watchOS, tvOS). Build intelligent automation pipelines that analyze, group, and prioritize failure signatures based on their reliability impact. Mentor engineers to design and develop advanced systems diagnostic and at-scale debug services to realize the vision of zero-iteration debugging and fully automated triage and root cause analysis. Develop telemetry-based dashboards to monitor at-scale panic/crash triage and analysis services to ensure they are working as expected and efficiently. Collaborate with Core OS, Hardware, Silicon, and other engineering teams to champion and advance improvements in debuggability, panic data quality, symbolication, and automation of triage and debug workflows.

Demonstrated track record of building and scaling high-performing engineering teams\nPassion for solving challenging technical problems that directly impact millions of users\nStrong communication skills with ability to influence technical direction across organizational boundaries\nExperience managing complex, multi-platform technical initiatives with measurable reliability improvements\nStrong technical depth in operating system internals will be helpful\nBS/MS in Computer Science, Compute Engineering, Electrical Engineering, or equivalent experience

Experience applying AI/ML for automated triage and reliability services is preferred\nExperience with large-scale telemetry systems processing millions of events daily is preferred
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 90733111
  • Position Id: fcc730d8127f9a59d453797a79a8cdfd
  • Posted 1 day ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Cupertino, California

Today

Full-time

Cupertino, California

Today

Full-time

Cupertino, California

Today

Full-time

Cupertino, California

Today

Full-time

Search all similar jobs