Network Support Engineer

Overview

Hybrid
Depends on Experience
Contract - W2

Skills

Network
Data Centers
HPC
GPU Computing
GPU
Artificial Intelligence

Job Details

Role: Network Support Engineer
Location: Santa Clara, CA 95051 - Hybrid/Remote
 
10+ Experience is Must
 
Top Skills:

Data Center & AI Cluster Networking
High-performance interconnects GPU, HPC, AI clusters
InfiniBand, Ultra Ethernet, ROCEv2, DCQCN
Dark Fiber / Carrier Interconnect Optimization
Hybrid DC Network Architecture & Fabric Design
Job Description/Responsibilities:
This is a hands-on network engineering position focused on the architecture, design, development and deployment of ultra-high-speed, resilient, and scalable DC AI Clusters and Interconnects for GPU-accelerated data centers and compute clusters. Outstanding problem-solving abilities and a comprehensive understanding of the network security protocols & standards, routing, switching, automation and deep understanding of fundamental network theory is also critical to your success.
Key Responsibilities
Lead the architecture, design, and deployment of global-scale DCs inter-connects and fabric for HPC, AI, and GPU computing clusters.
Develop high-performance data center fabric using InfiniBand, Ultra Ethernet and related technologies.
Optimize carrier interconnects, intra and inter DC routing, and dark fiber deployments to ensure low latency and high reliability.
Partner with system, OS, GPU, and HPC teams to deliver scalable, highly available networks for extreme-performance workloads.
Implement network monitoring, telemetry, solving, and continuous performance improvement processes.
Drive technology selection, vendor engagement, and lifecycle management for Data Center hardware and software.
What We re Looking For:
Minimum 6-8+ years of experience in building, managing and supporting large scale hybrid networks, developing automation pipelines with Python, Ruby, Go or other languages used in infrastructure automation.
SME in networking technologies: InfiniBand, Ultra Ethernet, ROCEv2, DCQCN, TCP/UDP, IPv4/IPv6, BGP/MP-BGP, VPN, L2 switching, EVPN, VxLAN, Segment Routing, MPLS.
Experience automating network infrastructure
Experience using an automated configuration management system (Python Ansible, Salt, etc.)

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About K-Tek Resourcing LLC