Overview
Skills
Job Details
About Infarsight
Infarsight empowers enterprises to accelerate strategic transformation through automation, AI, and product innovation. Our platforms Task Sight, Trip Sight, and Fleet Sight simplify complex operations across Finance, HR, Supply Chain, and beyond. With scalable engineering and real-time analytics, we help organizations move faster, smarter, and more efficiently.
Website:
The Opportunity
We are seeking an experienced and highly skilled Kafka Platform Engineer to manage, maintain, and scale our critical data streaming infrastructure. This role is essential for ensuring the performance, security, and reliability of our Confluent Kafka clusters, supporting Infarsight s next generation of AI and automation platforms.
Key Responsibilities
1. Kafka Cluster Management & Operations
Install, configure, and maintain robust, high-performance Confluent Kafka clusters in an on-premises environment.
Manage core Kafka components, including Kafka Brokers, Zookeeper, Kafka Connect, and Kafka Streams.
Proactively monitor and tune Kafka cluster performance to meet stringent operational SLAs (Service Level Agreements).
2. Security & Compliance
Implement and manage comprehensive Kafka security measures, including SSL/TLS encryption, ACLs (Access Control Lists), and role-based access control (RBAC).
Ensure the Kafka infrastructure adheres strictly to the organization's security policies and industry best practices.
3. Monitoring, Alerting, and Troubleshooting
Design and deploy effective monitoring and alerting solutions using tools such as Prometheus, Grafana, or Confluent Control Center.
Perform real-time log and metric analysis to proactively detect and resolve potential system issues.
Provide timely operational support and incident management for all Kafka-related issues, collaborating with development and data engineering teams to resolve application-level problems.
4. Maintenance, Upgrades, and Disaster Recovery
Regularly execute maintenance, patching, and upgrades for all Kafka components.
Implement and manage robust Kafka backup, disaster recovery, and failover procedures.
Experience with Kafka Cluster Linking and cross-cluster replication is required.
5. Documentation & Knowledge Sharing
Maintain accurate and detailed documentation of the Kafka architecture, configuration, and operational standards.
Create comprehensive runbooks for various administrative and support tasks.
6. Automation & Infrastructure-as-Code (Good to Have)
Develop and implement automation scripts and frameworks for streamlining Kafka operations
(provisioning, configuration, scaling, monitoring) using tools like Ansible, Terraform, or Python.
Automate Kafka cluster deployments, upgrades, and patch management processes.
Familiarity with CI/CD pipelines and Infrastructure-as-Code (IaC) concepts for managing Kafka environments.
1. Kafka Cluster Management & Operations
Install, configure, and maintain robust, high-performance Confluent Kafka clusters in an on-premises environment.
Manage core Kafka components, including Kafka Brokers, Zookeeper, Kafka Connect, and Kafka Streams.
Proactively monitor and tune Kafka cluster performance to meet stringent operational SLAs (Service Level Agreements).
2. Security & Compliance
Implement and manage comprehensive Kafka security measures, including SSL/TLS encryption, ACLs (Access Control Lists), and role-based access control (RBAC).
Ensure the Kafka infrastructure adheres strictly to the organization's security policies and industry best practices.
3. Monitoring, Alerting, and Troubleshooting
Design and deploy effective monitoring and alerting solutions using tools such as Prometheus, Grafana, or Confluent Control Center.
Perform real-time log and metric analysis to proactively detect and resolve potential system issues.
Provide timely operational support and incident management for all Kafka-related issues, collaborating with development and data engineering teams to resolve application-level problems.
4. Maintenance, Upgrades, and Disaster Recovery
Regularly execute maintenance, patching, and upgrades for all Kafka components.
Implement and manage robust Kafka backup, disaster recovery, and failover procedures.
Experience with Kafka Cluster Linking and cross-cluster replication is required.
5. Documentation & Knowledge Sharing
Maintain accurate and detailed documentation of the Kafka architecture, configuration, and operational standards.
Create comprehensive runbooks for various administrative and support tasks.
6. Automation & Infrastructure-as-Code (Good to Have)
Develop and implement automation scripts and frameworks for streamlining Kafka operations
(provisioning, configuration, scaling, monitoring) using tools like Ansible, Terraform, or Python.
Automate Kafka cluster deployments, upgrades, and patch management processes.
Familiarity with CI/CD pipelines and Infrastructure-as-Code (IaC) concepts for managing Kafka environments.
Required Experience & Skills
3-5 years of hands-on experience managing and maintaining Confluent Kafka clusters in production environments, with a strong preference for on-premises experience.
Solid understanding of distributed systems, high availability (HA), and failover mechanisms.
Expertise in Kafka security implementation (SSL/TLS, ACLs, RBAC).
Demonstrable experience setting up and managing monitoring for Kafka using industry-standard tools.
Location: Remote Working position. Preference for candidates based in the US.
Work Timings (for resources based in India): Must provide at least six hours of coverage aligned with US Central Daylight Time (CDT).
Travel: Travel is required at the start of the engagement.