Embedded system test engineer with PCIe, BMC

Aziro Technologies LLC
Dice Job Match Score™
👤 Reviewing your profile...
Job Details
Skills
- PCIe
- CXL
- BMC
Summary
Embedded System Testing
Location: Remote - San Jose, CA, USA
Department: Verification
Job Description
We are hiring a Senior QA Engineer – Performance & Reliability to lead the performance characterization and reliability validation of our Secure TCU System, ensuring it meets rigorous data center standards.
In this role, you will own the test design, execution, and deep-dive analysis for performance and reliability, working closely with development teams to identify bottlenecks and resolve complex system-level issues.
Key Responsibilities
- Performance & Reliability Strategy
- Test Design & Execution: Design and execute comprehensive test plans for performance benchmarking, stress testing, longevity/endurance testing, and thermal/power characterization of TCU/BMC systems.
- Workload Analysis: Analyze system behavior under various heavy workloads to identify performance bottlenecks in throughput, latency, and resource utilization (CPU, Memory, PCIe).
- Reliability Validation: Conduct Mean Time Between Failures (MTBF) prediction, long-duration stability tests, and error injection campaigns to validate system robustness.
- Deep Dive & Issue Resolution
- Root Cause Analysis: Lead the deep-dive investigation of performance degradation and reliability failures. Use advanced debugging tools (oscilloscopes, logic analyzers, firmware traces) to isolate issues.
- Developer Collaboration: Work directly with firmware and hardware engineers to reproduce complex bugs, analyze crash dumps, and verify fixes.
- Infrastructure Enhancement: Develop and maintain automated performance testing frameworks and reporting dashboards to track regression and trends over time.
- Reporting & Leadership
- Reporting: Generate detailed performance assessment reports and reliability analysis metrics for stakeholders.
- Mentorship: Mentor junior engineers on performance testing methodologies and system debugging techniques.
Qualifications
- Experience: 5+ years of experience in embedded system testing, with a strong focus on performance verification and reliability engineering.
- System Knowledge: Deep understanding of TCU, BMC, HMC, RoT (Root of Trust), Secure Boot, TPM, HSM, PCIe (Gen4/5), DDR memory, and networking protocols.
- Performance Tools: Proficiency with performance profiling tools, traffic generators, and standard benchmarks (e.g., SPEC, IOzone, iperf). Experience with thermal and power measurement tools.
- Programming: Strong scripting skills in Python for test automation and data analysis; familiarity with C/C++ for code analysis is a plus.
- Operating Systems: Strong Linux/Unix skills, including kernel tuning, system monitoring, and log analysis.
- Tools: Experience with CI/CD pipelines (Jenkins, GitLab CI) and version control (Git).
- Education: BS/MS degree in Computer Science, Electrical Engineering, or a related field.
Ways to Stand Out
- Experience with AI-driven log analysis or anomaly detection tools to predict reliability issues.
- Background in validation of high-speed interfaces (PCIe, CXL) and memory subsystems (DDR5/LPDDR5).
- Experience with data center server architecture and thermal management.
- Knowledge of industry reliability standards (e.g., Telcordia, JEDEC).
- Dice Id: 10426227
- Position Id: 8862157
- Posted 30+ days ago
Company Info
About Aziro Technologies LLC
Aziro (formerly MSys Technologies and pronounced as "Ah-zee-roh") is an AI-native product engineering company driving innovation-led transformation for global enterprises, high-growth ISVs, and AI-first pioneers.
Similar Jobs
It looks like there aren't any Similar Jobs for this job yet.
Search all similar jobs