Overview
Skills
Job Details
Overview:
We are seeking a highly skilled Senior Flink Architect/Engineer with extensive experience in stream processing, cloud-native deployments, and platform support. This role demands deep expertise in Apache Flink using the DataStreams API, particularly in production environments, and end-to-end delivery capabilities in Azure Kubernetes Service (AKS). The ideal candidate is a strong technical leader with a hands-on background in both stream processing logic and infrastructure automation.
Mandatory Requirements:
3+ years of hands-on experience with Apache Flink, specifically the DataStreams API
Proven track record of production-grade Flink deployments, with case studies or documentation
Currently supporting at least one active client using Flink DataStreams API
Strong knowledge of state management using checkpoints and savepoints (local storage & ADLS)
Experience configuring Flink connectors like Azure EventHub, Kafka, and MongoDB
Expertise in Flink aggregators, watermarks, and handling out-of-order events
Built and deployed private Flink clusters in AKS, including session-based and application-type deployments
Hands-on experience managing Job Managers, Task Managers, and cluster resources
Experience configuring RocksDB, heap memory, state recovery, and Auto-Pilot
Integrated Flink with external tools: ArgoCD (for deployments), Dynatrace, and LTM logging agents
Familiarity with Flink Dashboard, High Availability (HA), and Disaster Recovery (DR) setups
Core Responsibilities:
Functional:
Build and maintain Flink applications using DataStreams API
Implement Flink process functions, aggregators, and watermarking strategies
Manage stateful streaming applications using RocksDB and Azure Data Lake (ADLS)
Integrate Flink jobs with Kafka, EventHub, and MongoDB
Infrastructure & Platform:
Architect and manage Flink clusters in AKS with Kubernetes-based deployment models
Configure application/session deployments, task/job managers, and memory optimization
Set up HA/DR, observability, and AutoPilot for self-healing infrastructure
Implement deployment pipelines using ArgoCD, integrate logging and monitoring agents
Provide visibility and access through Flink Dashboard and monitoring platforms like Dynatrace
Please share resumes to