SRE, Postgres developer

Full Time

    Job Description

    As a Site Reliability Engineer (Kafka), you are in the frontline team keeping our large fleet of cloud-hosted Apache Kafka clusters up and running. Every day, you will diagnose and solve interesting technical problems, providing Kafka as a Managed Service in a highly automated environment. Our service is relied on by some of the leading global names in Banking and Financial Services, Telecom, IoT and Tech companies that interact with millions of end users.
    • Provide expert operational support to our nodes running in the cloud (AWS, Azure and Google Cloud Platform), using technologies such as Linux (Debian), Docker, and languages including Java, Python and bash.
    • Liaise with our customers' engineers in resolving interesting issues related to Apache Kafka usage and other supported technologies.
    • Undertake complex cluster operations such as migrations, upgrades and maintenance on our fleet.
    • Develop and continually improve our suite of internal automation tools, applications, and processes.
    Job Requirements
    We're looking for smart engineers with exceptional communication skills, a positive attitude, and a passion for IT and learning new things. We expect you to be, or quickly become proficient in a range of the technologies we use. Successful candidates for this role will:
    • Have strong experience in Apache Kafka and a desire to learn more and develop to a true expert level. Ideally should already have experience diagnosing various Kafka operational issues such as ISR drop, Broker failures, Consumer lag through the analysis of logs /graphs. Past experience with Kafka upgrades and migration would be favourable.
    • Preferably have past IT Customer service/support experience.
    • Good fundamental Computer science / software engineering skills and knowledge, particularly Operating System internals, memory management, and networking.
    • Strong knowledge and experience with Linux and be comfortable working from the command line (essential)
    • Exceptional ability to communicate clearly and professionally in written and verbal English (essential).
    • Work as part of a team and use your initiative to get things done.
    • Ability to follow required processes and procedures.
    • Investigating/researching Kafka issues by reviewing the Apache Kafka codebase or Kafka Jira project would be a plus.
    • Programming skills in Python or Java, and source code control using Git would be a plus.