Overview
On Site
USD 95,000.00 - 113,000.00 per year
Full Time
Skills
Storage
Computer Networking
Cloud Computing
Apache Hadoop
Big Data
HPC
IoT
Embedded Systems
Debugging
Test Plans
ROOT
Hardware Development
Regulatory Compliance
Management
Root Cause Analysis
Process Improvement
Workflow
Customer Engagement
Computer Hardware
FOCUS
Reliability Engineering
Server Hardware
Sockets
IPMI
Sensors
BIOS
Firmware
Stress Testing
Effective Communication
Reporting
Automated Testing
Scripting
Python
Bash
Windows PowerShell
CPU
ARM
Dashboard
Training
Forms
Job Details
Job Req ID: 26899
About Supermicro:
Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop/ Big Data, Hyperscale, HPC and IoT/Embedded customers worldwide. We are the #5 fastest growing company among the Silicon Valley Top 50 technology firms. Our unprecedented global expansion has provided us with the opportunity to offer a large number of new positions to the technology community. We seek talented, passionate, and committed engineers, technologists, and business leaders to join us.
Job Summary:
Supermicro Computer is seeking an experienced Reliability Engineer to execute reliability validation for high-performance server platforms, with a specific focus on CPU validation and environmental stress testing. This role is critical in ensuring system-level robustness, long-term stability, and thermal reliability of products.
The engineer will design, execute, and analyze reliability and stress test plans, including thermal cycling, high-temperature operating and power cycling, while also coordinating closely with cross-functional engineering teams. The ideal candidate has strong system-level debug experience, deep hardware knowledge (especially CPU platforms), and a proven track record in reliability validation and root cause analysis.
Essential Duties and Responsibilities:
Includes the following essential duties and responsibilities (other duties may also be assigned):
Qualifications:
Required:
Preferred:
Salary Range
$95,000 - $113,000
The salary offered will depend on several factors, including your location, level, education, training, specific skills, years of experience, and comparison to other employees already in this role. In addition to a comprehensive benefits package, candidates may be eligible for other forms of compensation, such as participation in bonus and equity award programs.
EEO Statement
Supermicro is an Equal Opportunity Employer and embraces diversity in our employee population. It is the policy of Supermicro to provide equal opportunity to all qualified applicants and employees without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veteran status or special disabled veteran, marital status, pregnancy, genetic information, or any other legally protected status.
About Supermicro:
Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop/ Big Data, Hyperscale, HPC and IoT/Embedded customers worldwide. We are the #5 fastest growing company among the Silicon Valley Top 50 technology firms. Our unprecedented global expansion has provided us with the opportunity to offer a large number of new positions to the technology community. We seek talented, passionate, and committed engineers, technologists, and business leaders to join us.
Job Summary:
Supermicro Computer is seeking an experienced Reliability Engineer to execute reliability validation for high-performance server platforms, with a specific focus on CPU validation and environmental stress testing. This role is critical in ensuring system-level robustness, long-term stability, and thermal reliability of products.
The engineer will design, execute, and analyze reliability and stress test plans, including thermal cycling, high-temperature operating and power cycling, while also coordinating closely with cross-functional engineering teams. The ideal candidate has strong system-level debug experience, deep hardware knowledge (especially CPU platforms), and a proven track record in reliability validation and root cause analysis.
Essential Duties and Responsibilities:
Includes the following essential duties and responsibilities (other duties may also be assigned):
- Develop and execute reliability test plans, including thermal, voltage, and long-duration stress testing.
- Monitor system health (e.g., error logs, temperature sensors) and analyze failures to determine root cause.
- Conduct CPU validation on a variety of motherboard and system configurations
- Maintain and calibrate thermal chambers, power cycling equipment, and automated stress platforms to ensure consistent test results.
- Coordinate closely with platform engineering, BIOS, hardware design, and quality teams to align on test coverage and resolve cross-functional issues.
- Document and maintain SOPs for test setups, execution, and reporting; ensure compliance with internal and industry test standards.
- Manage test schedules and resources (e.g., CPU samples, chambers, power equipment) to ensure validation milestones are met.
- Provide clear and detailed validation reports summarizing methodology, results, and root cause analysis for failures.
- Drive process improvements in the validation workflow, data tracking, and issue traceability.
Qualifications:
Required:
- Bachelor's or Master's degree in EE, CE, or a related technical field.
- 5+ years of experience in hardware validation, with a focus on CPU, system reliability, or stress testing.
- Strong hands-on experience with server hardware (e.g., CPU sockets, heatsinks, VRMs, DIMMs) and system-level validation.
- Proficient in using thermal chambers, power cycling tools, and monitoring utilities (IPMI, sensors, thermal cameras, etc.).
- Familiarity with industry-standard reliability methodologies
- Experience with BIOS configuration, firmware tools, and OS-based stress testing (e.g., Prime95, BurnInTest, LINPACK).
- Effective communication skills for reporting results and collaborating across teams.
Preferred:
- Experience with automated test environments and scripting (Python, Bash, or PowerShell).
- Background in validation of high-end server CPU platforms (Intel, AMD, or ARM-based).
- Prior experience maintaining or creating reliability SOPs and validation dashboards.
Salary Range
$95,000 - $113,000
The salary offered will depend on several factors, including your location, level, education, training, specific skills, years of experience, and comparison to other employees already in this role. In addition to a comprehensive benefits package, candidates may be eligible for other forms of compensation, such as participation in bonus and equity award programs.
EEO Statement
Supermicro is an Equal Opportunity Employer and embraces diversity in our employee population. It is the policy of Supermicro to provide equal opportunity to all qualified applicants and employees without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veteran status or special disabled veteran, marital status, pregnancy, genetic information, or any other legally protected status.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.