Senior Site Reliability Engineer

Remote • Posted 1 hour ago • Updated 1 hour ago
Full Time
Remote
Fitment

Dice Job Match Score™

⏳ Almost there, hang tight...

Job Details

Skills

  • Operational Excellence
  • Generative Artificial Intelligence (AI)
  • Organized
  • SAP BASIS
  • Incident Management
  • Service Level
  • FOCUS
  • Collaboration
  • Amazon EC2
  • Amazon Web Services
  • Management
  • Terraform
  • Orchestration
  • Kubernetes
  • Microservices
  • Root Cause Analysis
  • Problem Management
  • Reliability Engineering
  • English
  • Ansible
  • Configuration Management

Summary

We are seeking a Senior Site Reliability Engineer to ensure the operational excellence and reliability of our production services. This role combines core SRE responsibilities with a specialization in generative AI technologies, focusing on AWS infrastructure, Kubernetes orchestration and observability platforms to support mission-critical systems. Participation in the on-call support rotation is required for this role. The schedule is organized on a rotating basis, with each engineer covering one calendar week approximately once per month. Responsibilities Provide operational support for production services, including on-call rotation and major incident handling Define, monitor and maintain Service Level Objectives (SLOs) and Indicators (SLIs) to ensure reliability Manage and operate AWS infrastructure, particularly Kubernetes clusters, using Infrastructure as Code Ensure the reliability and performance of microservices and event-driven architectures Manage, tune and optimize search and observability platforms, with a specific focus on OpenSearch performance Conduct root cause analysis (RCA) and drive problem management to prevent recurring issues Take ownership of production environments and reliability outcomes Collaborate with engineering teams to embed a reliability mindset across the organization Requirements 3+ years of experience in Site Reliability Engineering or related operational roles Expertise in AWS services including EC2, EKS and ECS Proficiency in AWS Bedrock and OpenSearch Knowledge of IAM and AWS infrastructure management Skills in Infrastructure as Code using Terraform Background in container orchestration with Kubernetes Familiarity with observability tools such as Instana, CloudWatch and ELK Understanding of microservices, APIs and event-driven processing Capability to perform strong RCA and problem management Competency in SLO/SLI definition and reliability engineering practices Upper-Intermediate English language proficiency (B2) Nice to have Familiarity with Ansible for configuration management
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10330481
  • Position Id: 66b6991f4483409862de14f61e87d53d
  • Posted 1 hour ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Remote or Cambridge, England

Today

Full-time

USD 95,000.00 - 171,000.00 per year

Remote or Alpharetta, Georgia

Today

Full-time

USD 102,300.00 - 147,050.00 per year

Remote or Cambridge, England

Today

Full-time

USD 121,400.00 - 218,600.00 per year

Remote

Today

Full-time

USD 388,000.00 - 558,000.00 per year

Search all similar jobs