PCIe - Performance and Reliability Test Engineer

San Jose, CA, US • Posted 30+ days ago • Updated 8 days ago
Contract Independent
Contract W2
Contract Corp To Corp
On-site
Depends on Experience
Company Branding Image
Fitment

Dice Job Match Score™

⭐ Evaluating experience...

Job Details

Skills

  • TCU
  • BMC
  • HMC
  • RoT
  • C++
  • PCIe
  • CXL
  • CI/CD

Summary

Senior QA Engineer Performance & Reliability

Location: San Jose, CA, USA

Department: Verification

Job Description

We are hiring a Senior QA Engineer Performance & Reliability to lead the performance characterization and reliability validation of our Secure TCU System, ensuring it meets rigorous data center standards.

In this role, you will own the test design, execution, and deep-dive analysis for performance and reliability, working closely with development teams to identify bottlenecks and resolve complex system-level issues.

Key Responsibilities

  • Performance & Reliability Strategy
  • Test Design & Execution: Design and execute comprehensive test plans for performance benchmarking, stress testing, longevity/endurance testing, and thermal/power characterization of TCU/BMC systems.
  • Workload Analysis: Analyze system behavior under various heavy workloads to identify performance bottlenecks in throughput, latency, and resource utilization (CPU, Memory, PCIe).
  • Reliability Validation: Conduct Mean Time Between Failures (MTBF) prediction, long-duration stability tests, and error injection campaigns to validate system robustness.
  • Deep Dive & Issue Resolution
  • Root Cause Analysis: Lead the deep-dive investigation of performance degradation and reliability failures. Use advanced debugging tools (oscilloscopes, logic analyzers, firmware traces) to isolate issues.
  • Developer Collaboration: Work directly with firmware and hardware engineers to reproduce complex bugs, analyze crash dumps, and verify fixes.
  • Infrastructure Enhancement: Develop and maintain automated performance testing frameworks and reporting dashboards to track regression and trends over time.
  • Reporting & Leadership
  • Reporting: Generate detailed performance assessment reports and reliability analysis metrics for stakeholders.
  • Mentorship: Mentor junior engineers on performance testing methodologies and system debugging techniques.

Qualifications

  • Experience:5+ years of experience in embedded system testing, with a strong focus on performance verification and reliability engineering.
  • System Knowledge:Deep understanding of TCU, BMC, HMC, RoT (Root of Trust), Secure Boot, TPM, HSM, PCIe (Gen4/5), DDR memory, and networking protocols.
  • Performance Tools:Proficiency with performance profiling tools, traffic generators, and standard benchmarks (e.g., SPEC, IOzone, iperf). Experience with thermal and power measurement tools.
  • Programming:Strong scripting skills in Python for test automation and data analysis; familiarity with C/C++ for code analysis is a plus.
  • Operating Systems:Strong Linux/Unix skills, including kernel tuning, system monitoring, and log analysis.
  • Tools:Experience with CI/CD pipelines (Jenkins, GitLab CI) and version control (Git).
  • Education:BS/MS degree in Computer Science, Electrical Engineering, or a related field.

Ways to Stand Out

  • Experience with AI-driven log analysis or anomaly detection tools to predict reliability issues.
  • Background in validation of high-speed interfaces (PCIe, CXL) and memory subsystems (DDR5/LPDDR5).
  • Experience with data center server architecture and thermal management.
  • Knowledge of industry reliability standards (e.g., Telcordia, JEDEC).
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10426227
  • Position Id: 8861975
  • Posted 30+ days ago

Company Info

About Aziro Technologies LLC

Aziro (formerly MSys Technologies and pronounced as "Ah-zee-roh") is an AI-native product engineering company driving innovation-led transformation for global enterprises, high-growth ISVs, and AI-first pioneers.

Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

San Jose, California

26d ago

Easy Apply

Contract

50 - 70

San Jose, California

25d ago

Easy Apply

Contract

Depends on Experience

Remote

8d ago

Easy Apply

Contract, Third Party

Depends on Experience

San Jose, California

25d ago

Easy Apply

Contract

$80 - $90

Search all similar jobs