Overview
Skills
Job Details
Job Title: Kafka Administrator Location: New York, NY (Hybrid 3 days onsite)
Job Summary:
We are seeking a skilled Kafka Administrator to manage and support our enterprise Kafka infrastructure. The ideal candidate will be responsible for deploying, configuring, securing, and optimizing Apache Kafka clusters to ensure performance, availability, and scalability. This is a hybrid position based in New York, requiring three days onsite per week.
Key Responsibilities:
Kafka Cluster Management: Install, configure, and maintain Kafka clusters, including setup of high availability (HA) and disaster recovery (DR) strategies.
Performance Optimization: Monitor and tune Kafka clusters for optimal throughput and low latency.
Security Administration: Implement access controls, authentication mechanisms (e.g., Kerberos, SSL), and encryption policies.
Monitoring & Alerting: Establish robust monitoring and alerting systems using tools like Prometheus, Grafana, or similar.
Troubleshooting: Diagnose and resolve Kafka-related issues such as message delivery failures, latency spikes, and broker unavailability.
Integration Support: Work closely with development, DevOps, and data engineering teams to integrate Kafka with microservices, data pipelines, and real-time streaming applications.
Automation: Develop and maintain scripts/tools for cluster automation and management tasks.
Documentation: Maintain comprehensive documentation for Kafka infrastructure, processes, and procedures.
Capacity Planning: Analyze system resource usage and plan for future scaling needs.
Required Skills and Experience:
5+ years of experience in IT infrastructure or systems administration.
Strong hands-on experience with Apache Kafka administration in production environments.
Experience with Kubernetes and container orchestration in cloud or hybrid environments.
Proficiency with Linux systems, networking, and virtualization.
Knowledge of Kafka internals such as brokers, zookeepers, topics, partitions, consumers, and producers.
Familiarity with CI/CD pipelines and infrastructure-as-code tools (e.g., Terraform, Ansible) is a plus.
Experience with monitoring tools (e.g., Grafana, Prometheus, Splunk, or ELK stack).
Strong scripting skills (e.g., Bash, Python).
Excellent troubleshooting, documentation, and communication skills.