Site Reliability Engineer

Overview

On Site
$100,000 - $120,000
Full Time

Skills

A+
Apache Kafka
Apache Spark
Aviation
Bash
Build Automation
Collaboration
Computer Networking
Continuous Delivery
Continuous Integration
Docker
Functional Programming
GitLab
Good Clinical Practice
Google Cloud
Google Cloud Platform
Haskell
High Availability
IaaS
Java
Jenkins
Kubernetes
Linux
Management
Microservices
Nginx
Objective Caml
Operating Systems
Performance Tuning
Production Support
Prolog
Provisioning
Python
Recovery
Scalability
Scripting
Splunk
Storage
Streaming
Terraform
Unix
VMware
Virtualization

Job Details

Overview:

We are seeking an experienced and results-driven Senior Site Reliability Engineer (SRE) to join a high-
impact aviation technology project. This role requires a strong background in Java development, cloud

infrastructure, and site reliability best practices. The ideal candidate will bring a deep understanding of
system scalability, fault tolerance, observability, and hands-on production support in Kubernetes-based
environments running on Google Cloud Platform (Google Cloud Platform).
Core Responsibilities:
Design, implement, and maintain Java-based microservices ensuring high availability, scalability, and
performance.
Collaborate with development and infrastructure teams to support and optimize production
systems using SRE principles.
Manage and maintain Kubernetes clusters, including deployments, scaling, networking, and
storage.
Develop and maintain robust CI/CD pipelines using tools like GitLab CI/CD and Jenkins.
Build automation for system health monitoring, alerting, log aggregation, and recovery using tools
such as Prometheus, Datadog, Splunk, and Kiali.
Integrate and operate event-driven systems leveraging Kafka, KSQLDB, Spark Streams, and cluster
federation.
Deploy and manage service mesh technologies such as Istio and Anthos Service Mesh.
Utilize EBPF for advanced observability and system tracing.
Support containerized applications using Docker, and infrastructure provisioning with Terraform.
Administer storage solutions in Kubernetes environments using Portworx.
Required Qualifications:
10+ years of experience in SRE.
Strong proficiency in Java is mandatory.
Solid experience in scripting languages like Python, Go, and Bash.
Deep understanding of Linux/Unix operating systems and system-level troubleshooting.
Proven experience with Kubernetes, Docker, and infrastructure as code tools like Terraform.
Strong background in CI/CD, monitoring, alerting, and performance tuning.
Hands-on experience with virtualization platforms including VMware.
Familiarity with tools like Nginx Controller, Seesaw, and service mesh technologies.
Proficient in handling large-scale systems and capable of automating repetitive operational tasks.
Experience with functional programming languages such as Prolog, Haskell, or OCaml is a plus.
Certification in Kubernetes is required.
Hands-on experience working in Google Cloud Platform environments is strongly required.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Vipany Global