Overview
$DOE
Accepts corp to corp applications
Contract - Independent
Contract - W2
Skills
BMC
PCI Express
CPU
Linux
Network
Interfaces
Configuration Management
Orchestration
Progress Chef
Ansible
Puppet
Python
Communication
Writing
Documentation
Pick
Knowledge Sharing
High Performance Computing
HPC
FOCUS
GPU
Hypervisor
Storage
Server Hardware
Computer Hardware
System Integration
Onboarding
Performance Testing
Firmware
BIOS
Testing
Job Details
Title: Hardware Qualification Engineer
Location: Santa Clara, CA & Clifton, NJ (Day 1 Onsite, No Hybrid/Remote)
Job Description:
- Deep understanding of hardware designs and subsystems (BMC, PCIe, CPU, GPU, etc.)
- Proven experience with qualification of hardware designs for production release (SKU Qual)
- Experience with testing component subsystems for use in existing SKUs (Component Qual)
- Deep Linux systems experience including troubleshooting network interfaces,
- Developing and applying configuration management, security best practices and monitoring and alerting.
- Experience with firmware testing and deployment (Firmware Qual)
- Strong automation mindset. Expert knowledge in 1 or more orchestration tools such as
- Salt, Chef, Ansible or Puppet, and strong Python skills.
- Strong communication skills. Your job will involve writing detailed documentation for
- others to pick up or leading knowledge sharing sessions with operations teams.
Bonus skills include
- Hands-on experience in High Performance Computing (HPC) clustered environments from Nvidia or AMD. Experience in performing automated wide scale testing on NCCL or other frameworks.
- Hands-on experience in qualification automation with specific focus on developing testing within an automation framework for hands-free qualification
What You'll Be Working On:
- Onsite support of our hardware qualification efforts in NYC3 and SFO2
- Hardware qualification of new server SKUs for Compute and GPU Hypervisor, Storage and Infrastructure server hardware
- Hardware validation against design targets (functional and performance related)
- Hardware reconfiguration to support different testing efforts (changes to server components)
- Troubleshooting hardware integration with the platform operational tooling (onboarding)
- Firmware validation and qualification
- Performance testing, analysis and monitoring
- Firmware, BIOS, Kernel upgrades and testing
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.