Site Reliability Engineer (SRE) Apache Flink & Kubernetes

Overview

Full Time

Part Time

Accepts corp to corp applications

Contract - Independent

Contract - W2

Skills

High Availability

Scalability

Collaboration

Management

Root Cause Analysis

Apache Flink

Kubernetes

Scripting

Python

Bash

Grafana

Cloud Computing

Amazon Web Services

Google Cloud

Google Cloud Platform

Microsoft Azure

Network Security

Orchestration

Continuous Integration

Continuous Delivery

DevOps

Problem Solving

Conflict Resolution

Debugging

Communication

Job Details

**************LOCAL PREFERRED***********************

We are seeking a highly skilled Site Reliability Engineer (SRE) with strong expertise in Apache Flink, Kubernetes, and automation. The ideal candidate will be responsible for designing, deploying, and maintaining scalable, resilient systems, while ensuring high availability and performance in production environments. This role requires a solid background in distributed systems, container orchestration, and DevOps practices.

Key Responsibilities

Design, implement, and maintain scalable Apache Flink deployments on Kubernetes.
Develop automation tools and scripts to streamline deployment, monitoring, and maintenance of Flink jobs and infrastructure.
Ensure high availability, scalability, and reliability of production systems.
Collaborate with development and infrastructure teams to optimize application performance.
Build and manage monitoring/alerting systems using Prometheus, Grafana, ELK stack, or similar tools.
Work with cloud platforms (AWS, Google Cloud Platform, Azure) to design and manage infrastructure.
Apply best practices for networking, security, and container orchestration.
Troubleshoot complex production issues and drive root cause analysis.
Contribute to CI/CD pipelines for deployment automation.
Participate in on-call rotations to ensure uptime and reliability.

Required Skills & Qualifications

Strong hands-on experience with Apache Flink in production environments.
Expertise in Kubernetes (Helm, Operators, CRDs).
Proficiency in scripting languages (Python, Bash, Go).
Experience with monitoring & observability tools (Prometheus, Grafana, ELK, etc.).
Solid understanding of cloud platforms (AWS, Google Cloud Platform, Azure).
Strong knowledge of networking, security, and container orchestration.
Familiarity with CI/CD pipelines and DevOps practices.
Excellent problem-solving, debugging, and communication skills.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Site Reliability Engineer (SRE) Apache Flink & Kubernetes

Job Details

Key Responsibilities

Required Skills & Qualifications

About Purple Drive Technologies LLC

Share