Applicants must be authorized to work for our organization on a W2 basis. We are unable to engage candidates through Corp-to-Corp (C2C), third-party employers, or independent consulting arrangements for this position.
Required Qualifications
8+ years of overall IT industry experience.
5+ years of hands-on experience with Kafka or NoSQL technologies.
Strong programming skills in Python and/or Java, with a focus on automation and tooling.
Experience with CI/CD pipelines and Infrastructure as Code (IaC) tools such as Git, CloudFormation, and
Experience with at least one cloud platform: AWS, Azure, or Kubernetes-based environments.
Experience building AI-powered solutions, MCP Servers, Agentic AI systems, or GenAI-based automation tools.
Strong Linux/Unix administration and troubleshooting experience.
Excellent analytical, debugging, problem-solving, verbal, and written communication skills.
Preferred Qualifications
Experience with DevOps and Site Reliability Engineering (SRE) practices.
Strong production support, incident management, issue triaging, and root cause analysis experience.
Experience with Docker and Kubernetes administration, deployment, and performance tuning.
Knowledge of security best practices, vulnerability management, CVE analysis, and monitoring cloud/system/device logs.
Experience designing self-service platforms and operational automation solutions.
Key Responsibilities
Build, manage, and support Kafka and NoSQL platforms in production environments.
Design, implement, and maintain scalable platform architectures and deployment solutions.
Develop and maintain automation tools for infrastructure provisioning, monitoring, and operational workflows.
Integrate AI/GenAI capabilities into operational tools and platform management processes.
Design and implement CI/CD pipelines and Infrastructure as Code solutions.
Execute and manage code deployments across development, testing, staging, and production environments.
Troubleshoot and resolve platform, infrastructure, and application issues across all environments.
Monitor system performance, reliability, availability, and security, and drive continuous improvements.
Collaborate with development, operations, and architecture teams to improve platform efficiency and developer productivity.
Drive operational excellence through automation, observability, reliability engineering, and proactive issue resolution.