Job Title : Kafka/Confluent Administrator
Location : St Louis, MO 63131 (Onsite)
Duration : 12+ Months
Must Have Skills:
- Must have deep, handson experience running Kafka in largescale production environments, including cluster operations, upgrades, patches, and migrations.
- Should understand Kafka internals such as partitions, replication, retention/compaction, and rebalance strategies.
o Kafka Administration
o Platform / SRE / DevOps Experience
o Kafka Ecosystem Tools
o Linux + Networking
o Automation / Scripting
o Monitoring / Observability
o Disaster Recovery
Nice to Have Skills:
- AWS MSK / Apache Kafka Cloud: Experience with MSK operations and cloudaligned Kafka environments.
- Helpful for crossenvironment consistency between onprem and cloud.
- Hardware Refresh Experience: Prior work leading Kafka hardware refreshes or cluster rebuilds.
Detailed Job Description:
Minimum Qualifications- Education & Prior Job Experience:
We re seeking a senior contract Kafka/Confluent administrator to own and evolve our on-prem event streaming platform, with a primary focus on Confluent Platform. You will lead planning and execution of a hardware refresh for our on-prem clusters, drive reliability and performance, and embed DevOps/automation across provisioning, deployment, observability, and incident response. Experience with Apache Kafka and AWS MSK is desired for secondary support and cross-environment alignment. Comprehensive documentation and runbooks are required deliverables.
Kafka Platform Support Key Responsibilities
- Design, deploy, and operate highly available Kafka clusters (on-prem, cloud, and/or managed services such as Confluent Cloud or AWS MSK).
- Manage topics, partitions, quotas, retention policies, and consumer group strategies for performance and cost.
- Own upgrades, patches, and migrations.
- Implement and manage Kafka components: Kafka Connect, Schema Registry, MirrorMaker/Confluent Replicator, REST Proxy; familiarity with Kafka Streams and ksqlDB is a plus.
- Performance tuning (producers/consumers, batching, compression, acks, ISR, controller health), throughput testing, and benchmarking.
- Capacity planning, partitioning strategy, and cluster right-sizing.
Contract Deliverables
- Hardware refresh plan: capacity model, sizing, architecture diagrams, migration/cutover strategy, risk register
- Implement and validated on-prem clusters on refreshed hardware with performance benchmarks
- Operational documentation: standards, runbooks, monitoring/alerts configuration, backup/restore and DR playbooks.
- Knowledge transfer sessions and documentation handoff at milestones and project close.
Minimum Qualifications
- 5+ years in systems/platform engineering, SRE, or DevOps; 4+ years operating Kafka in production at scale.
- Deep knowledge of Kafka internals: partitions, replication, retention/compaction, rebalance strategies.
- Hands-on with Kafka Connect, Schema Registry, MirrorMaker/Confluent Replicator.
- Strong Linux fundamentals; networking (TCP, DNS, load balancing), and performance analysis.
- Proficiency in automation/scripting.
- Monitoring/observability: Data Dog, Grafana, JMX exporters, and log aggregation.
- Experience with DR, multi-region design, and incident management.
- Proven ability to produce clear, comprehensive documentation
Preferred Qualifications
- Experience with Apache Kafka and AWS MSK operations and integration.
- Experience executing hardware refreshes mor major cluster rebuilds/migrations with minimal downtime.