Senior Kafka Platform Engineer (Automation & Kubernetes)

• Posted 30+ days ago • Updated 8 hours ago

Full Time

Fitment

Dice Job Match Score™

👾 Reticulating splines...

Job Details

Skills

SAFE
Capacity Management
Provisioning
Budget
Incident Management
Dashboard
Network
Documentation
Mentorship
Communication
Partnership
Collaboration
ISR
Storage
Replication
Recovery
Terraform
Continuous Integration
Continuous Delivery
GitHub
Jenkins
Python
Java
Bash
Linux
File Systems
Reliability Engineering
Modeling
Performance Tuning
TLS
OAuth
ACL
RBAC
Management
Auditing
Regulatory Compliance
CruiseControl
Microsoft Azure
Google Cloud Platform
Google Cloud
Computer Networking
Cloud Computing
Amazon Web Services
Roadmaps
Data Processing
Apache Kafka
Apache Flink
Apache Spark
Streaming
Semantics
Kubernetes
Change Data Capture
Database
Disaster Recovery

Summary

We're seeking a seasoned Kafka engineer to design, operate, and scale our event streaming platform. You'll own the Kafka core (brokers, storage, security, observability) and the automation that powers it-building infrastructure-as-code, operators/Helm charts, and CI/CD to enable safe, self-service provisioning. You'll run Kafka on Kubernetes and/or cloud-managed offerings, ensure reliability and performance, and partner with application teams on best practices.

What you'll do

Architect, deploy, and operate production-grade Kafka clusters (self-managed and/or Confluent/MSK), including upgrades, capacity planning, multi-AZ/region DR, and performance tuning.
Operate Kafka on Kubernetes using Operators, Helm, and GitOps, and build IaC-driven automation with guardrails for repeatable, compliant, zero-downtime provisioning and deployments.
Implement and manage Kafka Connect, Schema Registry, and MirrorMaker 2/Cluster Linking; standardize connectors (e.g., Debezium) and build self-service patterns.
Drive reliability: define SLOs/error budgets, on-call rotations, incident response, postmortems, runbooks, and automated remediation.
Implement observability: metrics, logs, traces, lag monitoring, and capacity dashboards (e.g., PrometheGrafana, Burrow, Cruise Control, OpenTelemetry).
Secure the platform: TLS/mTLS, SASL (OAuth/SCRAM), RBAC/ACLs, secrets management, network policies, audit, and compliance automation.
Guide event-streaming best practices: topic design, partitioning, compaction/retention, idempotency, ordering, schema evolution/compatibility, DLQs, EOS semantics.
Partner with app, data, and SRE teams; provide enablement, documentation, and internal tooling for a great developer experience.
Lead/mentor engineers and contribute to roadmap, standards, and platform strategy.

Required qualifications

Excellent communication and partnership skills with platform and application teams.
Deep hands-on experience operating Kafka in production at scale (brokers, controllers, partitions, ISR, tiered storage/retention, rebalancing, replication, recovery).
Strong Kubernetes expertise running stateful systems.
Automation first: Infrastructure as Code (Terraform), Helm, Operators, GitOps (Argo CD/Flux), and CI/CD (e.g., GitHub Actions/Jenkins) for platform lifecycle.
Proficiency with one or more languages for tooling/automation: Python, Go, or Java; plus Bash and solid Linux fundamentals (networking, filesystems, JVM tuning basics).
Observability and reliability engineering for Kafka: PrometheGrafana, logging, alerting, lag monitoring, capacity/throughput modeling, performance tuning.
Security for data in motion: TLS/mTLS, SASL/OAuth, ACL/RBAC, secrets management (e.g., Vault), and audit/compliance practices.
Experience with Kafka ecosystem components: Kafka Connect, Schema Registry, MirrorMaker 2/Cluster Linking; familiarity with Cruise Control.
Cloud experience (AWS/Azure/Google Cloud Platform) with networking, IAM, and one or more managed offerings (e.g., Confluent Cloud or AWS MSK).
Proven track record designing runbooks, leading incidents/postmortems, and driving platform roadmaps.

Nice to have

Data processing frameworks (Kafka Streams, Flink, Spark Structured Streaming) and EOS semantics.
Experience with Strimzi or Confluent for Kubernetes in production.
Knowledge of CDC patterns and tools (e.g., Debezium) and database connectors at scale.
Multi-region architectures, cluster linking strategies, and disaster recovery drills.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10125634
Position Id: 6e82cfa782984f0f1fada790e46c572b
Posted 30+ days ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Chicago, Illinois

•

4d ago

Exciting opportunity at one of the fastest growing financial services firms around the world. They offer prime brokerage, clearing and financing across traditional and digital assets, and are now looking to hire world-class engineers to help build on their success. Responsibilities will include: Automate infrastructure and operational workflows using IaC with Terraform and AWS CDK. Develop & optimize CI/CD pipelines to improve software delivery for large-scale distributed systems using Amazon Co

Full-time

Principal Architect, Analytics AI/ML Platform Engineering

Chicago, Illinois

•

15d ago

TransUnion's Job Applicant Privacy Notice Personal Information We Collect Your Privacy Choices What We'll Bring: At TransUnion, we have a welcoming and energetic environment that encourages collaboration and innovation we're - consistently exploring new technologies and tools to be agile. This environment gives our people the opportunity to hone current skills and build new capabilities, while discovering their genius. Come be a part of our team - you'll work with great people, pioneering produc

Full-time

USD 168,750.00 - 281,250.00 per year

Senior Workflow Orchestration Engineer (Airflow & Scheduling Platforms)

Chicago, Illinois

•

Today

About the role We're seeking a seasoned engineer to design, operate, and scale our workflow orchestration platform with a primary focus on Apache Airflow. You'll own the Airflow control plane and developer experience end-to-end-architecture, automation, security, observability, and reliability-while also evaluating and operating complementary schedulers where appropriate. You'll build automation infrastructure and partner across data, trading, and engineering teams to deliver mission-critical pi

Full-time

Principal Architect, Analytics AI/ML Platform Engineering

Chicago, Illinois

•

Today

Full-time

USD 168,750.00 - 281,250.00 per year

Search all similar jobs

Senior Kafka Platform Engineer (Automation & Kubernetes)

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs