Site Reliability Engineer (SRE) Apache Flink & Kubernetes

Overview

Full Time
Part Time
Accepts corp to corp applications
Contract - Independent
Contract - W2

Skills

High Availability
Scalability
Collaboration
Management
Root Cause Analysis
Apache Flink
Kubernetes
Scripting
Python
Bash
Grafana
Cloud Computing
Amazon Web Services
Google Cloud
Google Cloud Platform
Microsoft Azure
Network Security
Orchestration
Continuous Integration
Continuous Delivery
DevOps
Problem Solving
Conflict Resolution
Debugging
Communication

Job Details

**************LOCAL PREFERRED***********************

We are seeking a highly skilled Site Reliability Engineer (SRE) with strong expertise in Apache Flink, Kubernetes, and automation. The ideal candidate will be responsible for designing, deploying, and maintaining scalable, resilient systems, while ensuring high availability and performance in production environments. This role requires a solid background in distributed systems, container orchestration, and DevOps practices.



Key Responsibilities

  • Design, implement, and maintain scalable Apache Flink deployments on Kubernetes.

  • Develop automation tools and scripts to streamline deployment, monitoring, and maintenance of Flink jobs and infrastructure.

  • Ensure high availability, scalability, and reliability of production systems.

  • Collaborate with development and infrastructure teams to optimize application performance.

  • Build and manage monitoring/alerting systems using Prometheus, Grafana, ELK stack, or similar tools.

  • Work with cloud platforms (AWS, Google Cloud Platform, Azure) to design and manage infrastructure.

  • Apply best practices for networking, security, and container orchestration.

  • Troubleshoot complex production issues and drive root cause analysis.

  • Contribute to CI/CD pipelines for deployment automation.

  • Participate in on-call rotations to ensure uptime and reliability.




Required Skills & Qualifications

  • Strong hands-on experience with Apache Flink in production environments.

  • Expertise in Kubernetes (Helm, Operators, CRDs).

  • Proficiency in scripting languages (Python, Bash, Go).

  • Experience with monitoring & observability tools (Prometheus, Grafana, ELK, etc.).

  • Solid understanding of cloud platforms (AWS, Google Cloud Platform, Azure).

  • Strong knowledge of networking, security, and container orchestration.

  • Familiarity with CI/CD pipelines and DevOps practices.

  • Excellent problem-solving, debugging, and communication skills.


Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Purple Drive Technologies LLC