Senior Site Reliability Engineer

Overview

On Site

340k - 440k

Full Time

Skills

Data Analysis

Global Operations

Big Data

Optimization

ISP

Tier 3

FOCUS

Computer Science

Electrical Engineering

Computer Engineering

TCP/IP

Border Gateway Protocol

Dragon NaturallySpeaking

DNS

TLS

HTTP

Caching

Proxies

Python

Management

Unix

Linux

Computer Networking

Storage

Data Processing

Apache Hive

Apache Spark

SQL

Statistics

Orchestration

Docker

Kubernetes

Job Details

One of our clients in the entertainment platform space is looking for a Level 5 Reliability Engineer with a deep background in nix systems, networking, data analysis, and operating large-scale platforms to help build, scale, automate, and maintain our globally distributed infrastructure.

Key Responsibilities

Lead efforts to enhance system resiliency, observability, monitoring, and automation-ensuring global operations remain scalable and reliable.
Collect, evaluate, and interpret significant volumes of server and application performance data using the Netflix Big Data platform to uncover optimization opportunities and identify trends or anomalies needing deeper analysis.
Support ISP partners with technical guidance for integrating our Open Connect Appliances.
Act as a Tier 3 escalation point and take part in the on-call rotation to address platform incidents.

Required Qualifications

At least 5 years of experience in site reliability or operational engineering roles supporting large-scale, high-performance systems with a focus on uptime and efficiency.
Bachelor's degree in Computer Science, Electrical or Computer Engineering, or equivalent experience (preferred).
In-depth understanding of networking and protocols including TCP/IP, BGP, DNS, TLS, and HTTP/S; familiarity with CDN and HTTP caching/proxy technologies.
Proficient in developing and maintaining automation using languages like Python.
Advanced expertise in managing and troubleshooting Unix/Linux environments at scale-covering networking, storage, and OS fundamentals.
Hands-on experience with distributed data processing tools such as Hive, Presto/Trino, or Spark SQL.
Strong applied statistics knowledge with the ability to write code that detects anomalous system behavior.
Some familiarity with containerization and orchestration technologies like Docker and Kubernetes.
Effective communicator and collaborator, comfortable working with internal teams and external partners alike.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

About Motion Recruitment Partners, LLC

Share