Senior Storage and Networking Product Engineer

Overview

On Site
USD 168,000.00 per year
Full Time

Skills

Microsoft Excel
Scalability
FOCUS
GPU
Distributed File System
Data Security
Regulatory Compliance
Encryption
Access Control
Auditing
Computer Science
Electrical Engineering
TCP/IP
Ethernet
InfiniBand
Fiber Channel
IBM GPFS
Ceph
Linux
Systems Engineering
Performance Tuning
Debugging
Scripting
Python
Bash
C
C++
Configuration Management
Orchestration
Ansible
Terraform
Puppet
Progress Chef
Stacks Blockchain
Grafana
Remote Direct Memory Access
HPC
Switches
Routers
Network
NetFlow
Wireshark
Capacity Management
Data Storage
Artificial Intelligence
Machine Learning (ML)
Training
Workflow
Cloud Storage
Kubernetes
Cloud Computing
Open Source
Computer Networking
Storage
Recruiting
Promotions
SAP BASIS
Law

Job Details

At NVIDIA, we are pioneers in making the impossible achievable, particularly within AI, ML, and HPC. Joining our team as a Storage & Networking Product Engineer involves being part of a group that fosters the development of highly available, high-performance infrastructure.

This role is vital for the flawless operation of NVIDIA's innovative compute platforms, integrating storage systems and advanced networking technologies. If you excel in ensuring low latency, high efficiency, and scalability, this is the opportunity to redefine data movement, system resilience, and automation!

What you'll be doing:
  • Architect, deploy, and maintain distributed storage clusters with a focus on scalable performance and data durability.
  • Develop and improve high-performance networking architectures for storage environments, ensuring low-latency data paths for AI/ML and HPC workloads.
  • Configure and tune RDMA, NVMe-over-Fabrics, RoCE, InfiniBand, and Ethernet-based fabrics for maximum performance.
  • Partner with GPU, networking, and systems teams to ensure seamless end-to-end performance across the full stack.
  • Develop automated systems for monitoring, recording, and notifying in storage and networking.
  • Build and maintain capacity planning models for network efficiency and storage growth.
  • Troubleshoot complex network-storage interactions, including bottlenecks in distributed filesystems, parallel storage, and interconnects.
  • Implement data protection and compliance controls such as encryption in-transit, access control, and auditing. and foster automation in storage and networking operations through the utilization of infrastructure-as-code and orchestration guided by AI/ML.

What we need to see:
  • BS/MS in Computer Science, Electrical Engineering, or a related field, or equivalent experience.
  • 12+ years of experience in storage systems engineering, production infrastructure, or large-scale data center operations.
  • Deep knowledge of networking protocols and technologies: TCP/IP, Ethernet, InfiniBand, RDMA, RoCE, NVMe-oF, Fibre Channel.
  • Hands-on experience with high-performance storage systems: Lustre, GPFS, Ceph, distributed object storage, enterprise SAN/NAS.
  • Expertise in Linux systems engineering, including tuning, performance analysis, and debugging.
  • Skilled in coding/scripting using Python, Bash, Go, or C/C++ to automate, monitor, and optimize performance.
  • Experience with configuration management/orchestration tools (Ansible, Terraform, Puppet, Chef, Kubernetes).
  • Familiarity with observability stacks (Prometheus, Grafana, Elastic, InfluxDB) to monitor and optimize storage and network performance.
  • Proficient in recognizing and resolving complex system bottlenecks within storage and networking layers.

Ways to Stand Out from the Crowd:
  • Experience crafting and operating RDMA-accelerated HPC/AI clusters at scale, with hands-on expertise with network topologies and large-scale switch/router deployments.
  • Familiarity with network telemetry, packet capture tools (sFlow, NetFlow, Wireshark, and proven history of capacity planning and optimizing performance for distributed storage systems over high-speed networks.
  • Background in jointly developing storage networks for AI/ML training pipelines, large-scale inference, and RAG workflows.
  • Proficiency in hybrid cloud storage and networking solutions (like Kubernetes CSI, cloud-native fabrics, and hybrid on-prem/cloud setups).
  • Contributions to open-source networking or storage projects.

NVIDIA is widely considered to be one of the technology world's most desirable employers! We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you!

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 168,000 USD - 264,500 USD.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until September 29, 2025.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.