Sr. System Engineer

Overview

On Site
USD 140,000.00 - 158,000.00 per year
Full Time

Skills

Storage
Cloud Computing
Apache Hadoop
Big Data
IoT
Embedded Systems
Server Hardware
Performance Tuning
Debugging
Acceptance Testing
High Availability
Computer Hardware
Documentation
Reporting
Scripting
Collaboration
Technical Training
Knowledge Sharing
Customer Support
Testing
Benchmarking
BIOS
Network
Technical Writing
Product Management
Science
Deep Learning
Computational Science
Code Optimization
Linux
Shell Scripting
Shell
Python
CUDA
Parallel Computing
LS-DYNA
Ansys
TensorFlow
PyTorch
Apache MXNet
Keras
Computer Networking
Data Storage
Scheduling
Artificial Intelligence
HPC
Conflict Resolution
Problem Solving
Multitasking
Effective Communication
Customer Facing
Training
Forms

Job Details

Job Req ID: 26371

About Supermicro:

Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop/ Big Data, Hyperscale, HPC and IoT/Embedded customers worldwide. We are the #5 fastest growing company among the Silicon Valley Top 50 technology firms. Our unprecedented global expansion has provided us with the opportunity to offer a large number of new positions to the technology community. We seek talented, passionate, and committed engineers, technologists, and business leaders to join us.

Job Summary:

As a System Engineer, you will work with team in porting, optimizing, and benchmarking AI/HPC applications on Supermicro server hardware platforms to enhance performance and efficiency. You will work closely with engineers from different teams, customers, and partners to support application deployment, troubleshooting, and performance tuning.

Your role will involve building, configuring, and maintaining AI/HPC clusters, both in-house and at customer sites, ensuring smooth operation and optimal resource utilization. You will contribute to technical documentation, assist with debugging, and provide solutions to complex system issues.

As part of a skilled engineering team, you will play a key role in on-site AI/HPC deployments, acceptance testing, and customer support, ensuring high availability, security, and performance of mission-critical infrastructure.

Essential Duties and Responsibilities:

Include the following essential duties and responsibilities (other duties may also be assigned):
Optimize AI/HPC hardware platforms with team
Set up and configure test software or applications by following provided instructions and documentation
Identify and report gaps in test setups while assisting in implementing solutions for successful execution
Troubleshoot installation and configuration issues, escalating complex problems as needed
Write and deploy basic scripts to support specific tasks during onsite visits
Develop technical relationships with customers and partners to support AI/HPC performance improvements
Gain a strong understanding of AI/HPC domains and collaborate with customers and partners on solutions
Provide technical training and knowledge sharing sessions on AI/HPC applications
Support team in resolving customer support issues related to AI/HPC systems
Assist in building processes and procedures for AI/HPC solutions
Contribute to proof-of-concept testing and benchmarking for AI/HPC applications
Assist in BIOS, OS, and network tuning for optimized system performance
Support on-site deployment services and customer acceptance verification
Draft and maintain technical documentation, including notes, diagrams, and reports
Work closely with Product Management and Engineering teams to relay customer feedback for future product improvements

Qualifications:
BS or higher in a computationally intensive science or engineering field
6+ years of experience in AI/Deep Learning, HPC, scientific computing, or related areas involving application optimization, compilers, digital signal processors, or GPUs
Proficiency in Linux OS, shell scripting, and system internals
Experience with Shell/Python, Containers, OpenMPI and familiarity with CUDA or other parallel programming models
Hands-on experience with AI/HPC application benchmarks in any of the following is a plus: LS-Dyna, OpenFOAM, PowerFLOW, Star-CCM+, Ansys, WRF, NAMD, Amber, LAMMPS, TensorFlow, PyTorch, MXNet, Keras, MLPerf, etc.
Understanding of networking, storage systems, and batch scheduling in AI/HPC environments
Strong problem-solving skills, ability to multitask, and a proactive mindset
Effective communication skills, with the ability to work both independently and as part of a team
Willingness to work in customer-facing roles, providing on-site support as needed
10% - 15% travel required, and occasional work outside of regular business hours may be necessary

Salary Range

$140,000 - $158,000

The salary offered will depend on several factors, including your location, level, education, training, specific skills, years of experience, and comparison to other employees already in this role. In addition to a comprehensive benefits package, candidates may be eligible for other forms of compensation, such as participation in bonus and equity award programs.

EEO Statement

Supermicro is an Equal Opportunity Employer and embraces diversity in our employee population. It is the policy of Supermicro to provide equal opportunity to all qualified applicants and employees without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veteran status or special disabled veteran, marital status, pregnancy, genetic information, or any other legally protected status.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.