Senior SRE/ AWS/ Observability

Overview

On Site

150k - 180k

Full Time

Skills

Computer Networking

Dependability

Performance Tuning

Scalability

Regulatory Compliance

Computer Science

Information Systems

Reliability Engineering

Java

Windows PowerShell

DevOps

Management

Microservices

Cloud Computing

Kubernetes

CHAOS

Testing

Amazon Web Services

Microsoft Azure

Google Cloud Platform

Google Cloud

Oracle Cloud

Disaster Recovery

CPU

Scripting

Python

Bash

Operational Efficiency

Service Level

Collaboration

Partnership

Insurance

SAP BASIS

Audiovisual

Job Details

This company is internationally recognized for delivering high-quality networking solutions and smart home innovations. With a strong global presence spanning over 170 countries, they are dedicated to enhancing everyday life through faster, more dependable connectivity. Known for its customer-first approach and commitment to excellence, it continues to grow its influence in both residential and commercial markets.
They are currently seeking a Senior Site Reliability Engineer to join their team on-site at their Irvine location. This role offers the opportunity to work on mission-critical cloud and microservices infrastructure, focusing on system reliability, automation, and performance optimization. You will play a vital role in driving observability, improving scalability, ensuring compliance, and supporting global product deployments within a dynamic and collaborative technical environment.

Required Skills & Experience

Bachelor's degree in Computer Science, Information Systems, or a similar technical discipline.
A minimum of five years' experience working in Site Reliability Engineering or a closely related field.
Strong coding and scripting abilities using languages such as Java, Python, Bash, or PowerShell.
Proven experience in SRE, DevOps practices, cloud platform management, and security implementation.

What You Will Be Doing

Act as a technical authority in deploying and maintaining microservices within cloud-native Kubernetes environments.
Conduct performance and resiliency testing (e.g., load and chaos testing) to validate system robustness under various conditions.
Implement end-to-end observability across distributed services hosted on platforms such as AWS, Azure, Google Cloud, and Oracle Cloud.
Coordinate disaster recovery strategies, ensuring readiness through close collaboration with infrastructure and application teams.
Diagnose and mitigate operational issues stemming from system resource limitations, such as CPU/memory constraints or inefficient auto-scaling configurations.
Develop automation tools and scripts using languages such as Python, Go, or Bash to enhance operational efficiency.
Define service-level metrics (SLAs, SLOs, SLIs) in partnership with development teams to align technical performance with business expectations.

The Offer
You will receive the following benefits:

Medical, Dental, and Vision Insurance
401K Retirement Savings Plan
Free Snacks and Drinks, and Catered Lunch
Free Gym Membership

Applicants must be currently authorized to work in the US on a full-time basis now and in the future.

#LI-AV3

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

About Motion Recruitment Partners, LLC

Share