Overview
On Site
USD 119,800.00 - 234,700.00 per year
Full Time
Skills
Instrumentation
Art
Scalability
Systems Design
Interfaces
Mentorship
Research
Computer Science
C
C++
C#
Java
JavaScript
Python
Screening
PASS
Cloud Computing
Linux Kernel
Conflict Resolution
Problem Solving
Open Source
GPU
Collaboration
FOCUS
Embedded Systems
Optimization
Computer Hardware
PCI Express
Communication
Artificial Intelligence
Data Flow
Stacks Blockchain
Debugging
Software Engineering
Integrated Circuit
Internal Communications
IC
SAP BASIS
Legal
Recruiting
Microsoft
Job Details
The MAIA System Infrastructure team is pioneering the next generation of the developer ecosystem for AI accelerators and we are looking to hire a Senior Software Engineer - MAIA - AI Accelerator Observability and Infrastructure. We are building the core infrastructure that enables deep observability into our proprietary MAIA chips, empowering developers to fully harness the capabilities of our custom AI hardware. Our mission is to create a transparent, performant, and developer-centric ecosystem that surpasses traditional GPU observability by offering unparalleled insight into low-level operations, performance characteristics, and system-wide behavior.
We operate at the intersection of advanced AI hardware, system software, and developer tooling, continually pushing the boundaries of what is possible. Our scope extends beyond on-chip instrumentation; we also play a critical role in optimizing the end-to-end data flow infrastructure, including PCIe and frontend networks, to ensure low-latency, high-throughput movement between host systems and accelerators. By decomposing and re-architecting data pathways into state-of-the-art designs, we are unlocking new levels of scalability and performance for AI workloads. Our work involves close collaboration with hardware architects, systems engineers, and AI researchers to build a cohesive observability and runtime foundation that defines the next era of AI system design.
Why Join Us?
This is an opportunity to work on the cutting edge of AI hardware acceleration, directly contributing to the infrastructure that makes deep observability and optimization possible. As a Senior Software Engineer - MAIA - AI Accelerator Observability and Infrastructure on the MAIA System Infrastructure team, you will have the chance to work on challenging, high-impact projects that require a blend of low-level programming, data flow optimization, and system design.
You'll be part of a team of highly talented engineers who are passionate about building the next generation of AI tooling infrastructure, and you'll have the opportunity to make a significant impact on how AI workloads are understood and optimized.
This role is ideal for core engineers who are looking to make their mark in an innovative environment, where their contributions will directly influence the performance and capabilities of cutting-edge AI systems.
Responsibilities:
As a Senior Software Engineer - MAIA - AI Accelerator Observability and Infrastructure on the MAIA System Infrastructure team, you will help advance the next generation of observability and runtime infrastructure for MAIA AI accelerators. You'll focus on enhancing system intelligence and execution reliability at scale with a focus on designing runtimes that can adapt dynamically to complex workload demands while maintaining performance and predictability. Your work will elevate how developers interact with MAIA hardware, elevating how developers interact with MAIA hardware and enabling streamlined, high-confidence execution across multi-accelerator and multi-node environments. This is a unique opportunity to shape foundational infrastructure at the frontier of AI hardware and distributed systems.
This role requires a deep technical background and a hands-on approach, as you will design and implement software that interfaces with both the MAIA chips and the data flow infrastructure.
You will:
Qualifications:
Required/Minimum Qualifications
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: ;br>
Microsoft posts positions for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form .
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
#aifx
#CoreAI
We operate at the intersection of advanced AI hardware, system software, and developer tooling, continually pushing the boundaries of what is possible. Our scope extends beyond on-chip instrumentation; we also play a critical role in optimizing the end-to-end data flow infrastructure, including PCIe and frontend networks, to ensure low-latency, high-throughput movement between host systems and accelerators. By decomposing and re-architecting data pathways into state-of-the-art designs, we are unlocking new levels of scalability and performance for AI workloads. Our work involves close collaboration with hardware architects, systems engineers, and AI researchers to build a cohesive observability and runtime foundation that defines the next era of AI system design.
Why Join Us?
This is an opportunity to work on the cutting edge of AI hardware acceleration, directly contributing to the infrastructure that makes deep observability and optimization possible. As a Senior Software Engineer - MAIA - AI Accelerator Observability and Infrastructure on the MAIA System Infrastructure team, you will have the chance to work on challenging, high-impact projects that require a blend of low-level programming, data flow optimization, and system design.
You'll be part of a team of highly talented engineers who are passionate about building the next generation of AI tooling infrastructure, and you'll have the opportunity to make a significant impact on how AI workloads are understood and optimized.
This role is ideal for core engineers who are looking to make their mark in an innovative environment, where their contributions will directly influence the performance and capabilities of cutting-edge AI systems.
Responsibilities:
As a Senior Software Engineer - MAIA - AI Accelerator Observability and Infrastructure on the MAIA System Infrastructure team, you will help advance the next generation of observability and runtime infrastructure for MAIA AI accelerators. You'll focus on enhancing system intelligence and execution reliability at scale with a focus on designing runtimes that can adapt dynamically to complex workload demands while maintaining performance and predictability. Your work will elevate how developers interact with MAIA hardware, elevating how developers interact with MAIA hardware and enabling streamlined, high-confidence execution across multi-accelerator and multi-node environments. This is a unique opportunity to shape foundational infrastructure at the frontier of AI hardware and distributed systems.
This role requires a deep technical background and a hands-on approach, as you will design and implement software that interfaces with both the MAIA chips and the data flow infrastructure.
You will:
- Lead by example in creating an inclusive culture that embraces diversity. Mentor and empower teammates, fostering an environment where all voices are heard and valued.
- Cultivate a team dynamic that drives high performance through mutual support and respect.
- Design, develop, and maintain the observability infrastructure for the MAIA AI accelerators, enabling developers to gather the data necessary to debug, profile, analyze, and optimize AI models with unprecedented depth.
- Optimize the data flow infrastructure over PCIe, ensuring efficient and high-throughput communication between the host and MAIA chips.
- Collaborate with hardware architects and system engineers to integrate the observability stack with the broader system, capturing detailed metrics and insights into data movement.
- Develop tools and libraries that provide a holistic view of data flow, execution, and performance, extending beyond traditional GPU observability to meet the unique needs of our accelerators.
- Engage with the AI research and developer community to understand their needs and incorporate feedback into the observability tools and data flow optimizations.
- Ensure that the observability and data flow infrastructure meet the highest standards of performance, security, and reliability.
Qualifications:
Required/Minimum Qualifications
- Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR equivalent experience.
- 3+ years of experience in system-level programming
- 2+ years of experience optimizing data movement and communication with extremely low-latency latency requirements
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
- Experience with Linux Kernel internals or Kernel Driver development
- Problem-solving skills and a track record of innovating solutions for complex system challenges in AI hardware and data infrastructure
- Experience with open-source development and contributions is a plus
- Experience developing within existing GPU ecosystems is a plus.
- Collaboration and communication skills, with the ability to work across multidisciplinary teams and engage with the developer community
- Experience with a focus on AI accelerators or advanced embedded systems systems and low-latency data flow optimization
- Expertise in developing observability, profiling, or debugging tools for complex hardware systems, including deep knowledge of PCIe communication
- Ability to design and implement software that captures and analyzes low-level operations of AI accelerators and data flow across multiple abstractions and software stacks
- In-depth experience with eBPF and related tools (e.g., BCC, bpftrace), with an understanding of how to leverage eBPF for advanced monitoring, tracing, and debugging in complex systems
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: ;br>
Microsoft posts positions for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form .
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
#aifx
#CoreAI
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.