Overview
Skills
Job Details
NAVA Software solutions is looking for a Platform Support Lead / Flink Engineer
Details:
Platform Support Lead / Flink Engineer
Location: Dallas TX Hybrid (Non local candidates will get travel expenses)
Duration: 6-12 months
Overview:
We are seeking a highly skilled Senior Flink Architect/Engineer with extensive experience in stream processing, cloud-native deployments, and platform support. This role demands deep expertise in Apache Flink using the DataStreams API, particularly in production environments, and end-to-end delivery capabilities in Azure Kubernetes Service (AKS). The ideal candidate is a strong technical leader with a hands-on background in both stream processing logic and infrastructure automation.
Mandatory Requirements:
- 3+ years of hands-on experience with Apache Flink, specifically the DataStreams API
- Proven track record of production-grade Flink deployments, with case studies or documentation
- Currently supporting at least one active client using Flink DataStreams API
- Strong knowledge of state management using checkpoints and savepoints (local storage & ADLS)
- Experience configuring Flink connectors like Azure EventHub, Kafka, and MongoDB
- Expertise in Flink aggregators, watermarks, and handling out-of-order events
- Built and deployed private Flink clusters in AKS, including session-based and application-type deployments
- Hands-on experience managing Job Managers, Task Managers, and cluster resources
- Experience configuring RocksDB, heap memory, state recovery, and Auto-Pilot
- Integrated Flink with external tools: ArgoCD (for deployments), Dynatrace, and LTM logging agents
- Familiarity with Flink Dashboard, High Availability (HA), and Disaster Recovery (DR) setups
Core Responsibilities:
Functional:
- Build and maintain Flink applications using DataStreams API
- Implement Flink process functions, aggregators, and watermarking strategies
- Manage stateful streaming applications using RocksDB and Azure Data Lake (ADLS)
- Integrate Flink jobs with Kafka, EventHub, and MongoDB
Infrastructure & Platform:
- Architect and manage Flink clusters in AKS with Kubernetes-based deployment models
- Configure application/session deployments, task/job managers, and memory optimization
- Set up HA/DR, observability, and AutoPilot for self-healing infrastructure
- Implement deployment pipelines using ArgoCD, integrate logging and monitoring agents
- Provide visibility and access through Flink Dashboard and monitoring platforms like Dynatrace
Qualifications:
- 7+ years of experience in backend/distributed systems engineering
- 3+ years of hands-on experience with Apache Flink (DataStreams API)
- 3 5 years of experience with Kafka, Azure EventHub, or similar platforms
- 2+ years managing cloud-native applications on AKS or Kubernetes
- Strong background in CI/CD, infrastructure as code (IaC), and cloud monitoring
- Excellent communication, technical leadership, and documentation skills
Key Deliverables:
- Fully deployed, production-grade Flink applications with logging and monitoring
- Scalable and highly available Flink infrastructure with HA/DR configurations
- Automated deployment processes via ArgoCD, integrated with Dynatrace and LTM
- Clear documentation and ongoing platform support