Site Reliability Engineer

Overview

On Site

175k - 225k

Full Time

Skills

Scalability

Software Design

Performance Metrics

Service Level

Root Cause Analysis

Innovation

Emerging Technologies

Boost

Mentorship

Software Engineering

DevOps

Reliability Engineering

JavaScript

TypeScript

Python

Database Performance Tuning

Optimization

Systems Design

Continuous Integration

Continuous Delivery

Configuration Management

Grafana

Orchestration

Kubernetes

Cloud Computing

Linux

Virtualization

Computer Networking

Firewall

Amazon Web Services

Provisioning

Command-line Interface

Computer Science

SAP BASIS

Job Details

Site Reliability Engineer

As the Senior or Staff SRE on the Platform Engineering team, you'll be joining at a foundational stage and play a key role in building and shaping a secure, resilient, and high-performance platform that powers engineering capabilities.

The company is located in New York and will remain 100% remote.

What You Will Be Doing:

Drive Platform Excellence: Continuously improve the platform's reliability, scalability, and deployment efficiency through innovative solutions and resilient system design.
Build Advanced Observability Solutions: Design, implement, and maintain comprehensive observability and monitoring frameworks to ensure system health, availability, and reliability.
Establish and Track Key Performance Metrics: Develop and monitor Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to define and measure system performance benchmarks.
Resolve Complex Issues and Perform Root Cause Analysis: Respond swiftly to critical incidents, troubleshoot sophisticated system and application problems, and conduct detailed root cause analyses to implement long-term solutions.
Lead with Innovation: Stay current with industry trends and emerging technologies. Evolve best practices to boost development quality and delivery speed.
Architect Scalable Systems: Take ownership of designing scalable, fault-tolerant, and distributed systems that meet high standards for performance and reliability.
Mentor and Advocate: Promote the use of modern technologies and best practices, foster adoption of sound architectural patterns, and provide mentorship to engineering peers across the organization.

Required Skills & Experience:

10-12+ years of experience in software engineering, DevOps, or Site Reliability Engineering (SRE)
Proficiency in at least two of the following languages: JavaScript, TypeScript, Python, Go
Strong expertise in diagnosing and resolving issues in complex distributed systems
Deep understanding of database performance tuning and optimization best practices
Proven ability to innovate and drive the adoption of new tools, processes, and standards
Strong skills in system design and cloud-native architecture
Expertise in CI/CD pipelines, configuration management, automation, and monitoring
Advanced understanding of observability practices and tools such as ELK, Datadog, OpenTelemetry, Prometheus, and Grafana
Experience with deployment and orchestration tools like AWS ECS, Kubernetes, Cloud Run, etc.
Solid knowledge of Linux systems, virtualization, networking, VPCs, firewalls, and security configurations
Hands-on experience with AWS services and infrastructure provisioning through CLI, APIs, or Infrastructure as Code (IaC)
Bachelor's degree in Computer Science or a related technical field, or equivalent practical experience

Applicants must be currently authorized to work in the United States on a full-time basis now and in the future.
This position doesn't provide sponsorship.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

About Motion Recruitment Partners, LLC

Share