Senior SRE/ AWS/ Observability

Overview

On Site
150k - 180k
Full Time

Skills

Computer Networking
Dependability
Performance Tuning
Scalability
Regulatory Compliance
Computer Science
Information Systems
Reliability Engineering
Java
Windows PowerShell
DevOps
Management
Microservices
Cloud Computing
Kubernetes
CHAOS
Testing
Amazon Web Services
Microsoft Azure
Google Cloud Platform
Google Cloud
Oracle Cloud
Disaster Recovery
CPU
Scripting
Python
Bash
Operational Efficiency
Service Level
Collaboration
Partnership
Insurance
SAP BASIS
Audiovisual
AV

Job Details

This company is internationally recognized for delivering high-quality networking solutions and smart home innovations. With a strong global presence spanning over 170 countries, they are dedicated to enhancing everyday life through faster, more dependable connectivity. Known for its customer-first approach and commitment to excellence, it continues to grow its influence in both residential and commercial markets.
They are currently seeking a Senior Site Reliability Engineer to join their team on-site at their Irvine location. This role offers the opportunity to work on mission-critical cloud and microservices infrastructure, focusing on system reliability, automation, and performance optimization. You will play a vital role in driving observability, improving scalability, ensuring compliance, and supporting global product deployments within a dynamic and collaborative technical environment.

Required Skills & Experience
  • Bachelor's degree in Computer Science, Information Systems, or a similar technical discipline.
  • A minimum of five years' experience working in Site Reliability Engineering or a closely related field.
  • Strong coding and scripting abilities using languages such as Java, Python, Bash, or PowerShell.
  • Proven experience in SRE, DevOps practices, cloud platform management, and security implementation.
What You Will Be Doing
  • Act as a technical authority in deploying and maintaining microservices within cloud-native Kubernetes environments.
  • Conduct performance and resiliency testing (e.g., load and chaos testing) to validate system robustness under various conditions.
  • Implement end-to-end observability across distributed services hosted on platforms such as AWS, Azure, Google Cloud, and Oracle Cloud.
  • Coordinate disaster recovery strategies, ensuring readiness through close collaboration with infrastructure and application teams.
  • Diagnose and mitigate operational issues stemming from system resource limitations, such as CPU/memory constraints or inefficient auto-scaling configurations.
  • Develop automation tools and scripts using languages such as Python, Go, or Bash to enhance operational efficiency.
  • Define service-level metrics (SLAs, SLOs, SLIs) in partnership with development teams to align technical performance with business expectations.

The Offer
You will receive the following benefits:
  • Medical, Dental, and Vision Insurance
  • 401K Retirement Savings Plan
  • Free Snacks and Drinks, and Catered Lunch
  • Free Gym Membership

Applicants must be currently authorized to work in the US on a full-time basis now and in the future.

#LI-AV3
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Motion Recruitment Partners, LLC