SRE Engineer - Air Platform Team - NVIDIA Corporation

Overview

On Site

USD 120,000.00 per year

Full Time

Skills

Computer Graphics

Innovation

Web Applications

Adobe AIR

High Availability

Terraform

FOCUS

Workflow

Operating Systems

Orchestration

Continuous Integration

Continuous Delivery

Git

Jenkins

Access Control

Operational Efficiency

Computer Science

Software Engineering

DevOps

Systems Engineering

Scripting

Ansible

Python

Shell Scripting

IaaS

Servers

Management

Grafana

Linux

NAT

DNS

Dragon NaturallySpeaking

DHCP

Routing

Firewall

iptables

Cloud Computing

Kubernetes

Docker

QEMU

Amazon Web Services

Debugging

Network

Performance Tuning

Benchmarking

Storage

Computer Networking

Regulatory Compliance

FedRAMP

HIPAA

System On A Chip

Recruiting

Promotions

SAP BASIS

Law

Job Details

NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. It's an outstanding legacy of innovation driven by extraordinary technology and amazing people. NVIDIA is looking for a highly motivated SRE Engineer to join the NVIDIA AIR team - the Digital Twin for Data Center Simulation web application. NVIDIA AIR enables cloud-scale efficiency by creating identical replicas of real-world data center infrastructure deployments. To learn more, visit NVIDIA AIR.

What you'll be doing:

Design, deploy, and manage IaaS platforms with a focus on high availability and performance.
Automate infrastructure operations using tools like Terraform, Ansible, and Python.
Focus on efficiency by automating repetitive workflows.
Develop monitoring and observability tooling to detect and prevent outages using Prometheus, Grafana, ELK, etc.
Deploy and troubleshoot non-disruptive cloud operations with an emphasis on secure production infrastructure.
Manage deployment/upgrades for Operating Systems, Kubernetes (k8s) clusters, and other orchestration tools.
Provide day-to-day support for engineering activities with CI/CD tools like Git and Jenkins.
Implement and enforce best practices around infrastructure security, access control, and operational efficiency.

What we need to see:

BS degree in Computer Science, Software Engineering, or a related field (or equivalent experience).
3-5+ years of experience in a Site Reliability, DevOps, or Systems Engineering role.
Strong automation and scripting skills in Ansible, Python, and Shell Scripting.
Experience in IaaS environments, including deploying, configuring, and administering Linux-based bare metal servers.
Deep experience in infrastructure engineering, focused on managing and monitoring a highly available production infrastructure.
Skilled in observability practices, using Prometheus, Grafana, ELK/EFK, and integrated alerting systems.
Solid grasp of Linux internals and core networking concepts including NAT, DNS, DHCP, routing, and firewall configuration with iptables or nftables.
Experience with modern deployment architecture for non-disruptive cloud operations, including blue-green and canary rollouts.
Proficiency in Kubernetes, Docker, QEMU, and Libvirt.

Ways to stand out from the crowd:

Hands-on expertise with AWS, including deploying complex, load-balanced, and highly available workloads.
Proficiency in debugging network issues in both infrastructure and SDN.
Experience with performance tuning and benchmarking across storage, compute, or networking.
Implemented robust metrics collection and alerting infrastructure.
Familiar with compliance standards such as FedRAMP, HIPAA, and SOC 2.

With competitive salaries and a generous benefits package ( ), we are widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us and, due to outstanding growth, our best-in-class engineering teams are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you!

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 120,000 USD - 189,750 USD for Level 2, and 148,000 USD - 235,750 USD for Level 3.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until August 22, 2025.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

SRE Engineer - Air Platform Team

Job Details

Share