Site Reliability Engineer

Overview

On Site

$100,000 - $120,000

Full Time

Skills

A+

Apache Kafka

Apache Spark

Aviation

Bash

Build Automation

Collaboration

Computer Networking

Continuous Delivery

Continuous Integration

Docker

Functional Programming

GitLab

Good Clinical Practice

Google Cloud

Google Cloud Platform

Haskell

High Availability

IaaS

Java

Jenkins

Kubernetes

Linux

Management

Microservices

Nginx

Objective Caml

Operating Systems

Performance Tuning

Production Support

Prolog

Provisioning

Python

Recovery

Scalability

Scripting

Splunk

Storage

Streaming

Terraform

Unix

VMware

Virtualization

Job Details

Overview:

We are seeking an experienced and results-driven Senior Site Reliability Engineer (SRE) to join a high-
impact aviation technology project. This role requires a strong background in Java development, cloud

infrastructure, and site reliability best practices. The ideal candidate will bring a deep understanding of
system scalability, fault tolerance, observability, and hands-on production support in Kubernetes-based
environments running on Google Cloud Platform (Google Cloud Platform).
Core Responsibilities:
Design, implement, and maintain Java-based microservices ensuring high availability, scalability, and
performance.
Collaborate with development and infrastructure teams to support and optimize production
systems using SRE principles.
Manage and maintain Kubernetes clusters, including deployments, scaling, networking, and
storage.
Develop and maintain robust CI/CD pipelines using tools like GitLab CI/CD and Jenkins.
Build automation for system health monitoring, alerting, log aggregation, and recovery using tools
such as Prometheus, Datadog, Splunk, and Kiali.
Integrate and operate event-driven systems leveraging Kafka, KSQLDB, Spark Streams, and cluster
federation.
Deploy and manage service mesh technologies such as Istio and Anthos Service Mesh.
Utilize EBPF for advanced observability and system tracing.
Support containerized applications using Docker, and infrastructure provisioning with Terraform.
Administer storage solutions in Kubernetes environments using Portworx.
Required Qualifications:
10+ years of experience in SRE.
Strong proficiency in Java is mandatory.
Solid experience in scripting languages like Python, Go, and Bash.
Deep understanding of Linux/Unix operating systems and system-level troubleshooting.
Proven experience with Kubernetes, Docker, and infrastructure as code tools like Terraform.
Strong background in CI/CD, monitoring, alerting, and performance tuning.
Hands-on experience with virtualization platforms including VMware.
Familiarity with tools like Nginx Controller, Seesaw, and service mesh technologies.
Proficient in handling large-scale systems and capable of automating repetitive operational tasks.
Experience with functional programming languages such as Prolog, Haskell, or OCaml is a plus.
Certification in Kubernetes is required.
Hands-on experience working in Google Cloud Platform environments is strongly required.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

About Vipany Global

Share