Overview
Skills
Job Details
Title: Hadoop Admin with EKS
Location: Sunnyvale, CA- Onsite
Site Reliability Engineer (SRE) - Cloud Infrastructure & Data- Ensure reliable, scalable, and secure cloud-based data infrastructure.
Skillset: Kubernetes | AWS | Hadoop(Spark, Python)
Experience: 6-9 years
- Design, implement, and maintain AWS infrastructure with a focus on data products.
- Automate infrastructure management using Pulumi or Terraform, and policy as code.
- Knowledge on building helm releases and working with Kubernetes custom resources.
- Monitor system health, optimise performance, and manage Kubernetes (EKS) clusters.
- Experience in creating and managing Hadoop Clusters is good to have.
- Knowledge on Apache Spark, Python is an addition.
- Implement security measures, ensure compliance, and mitigate risks.
- Collaborate with development teams on deployment and operation of data applications.
- Optimise data pipelines for efficiency and cost-effectiveness.
- Troubleshoot issues, participate in incident response, and drive continuous improvement.
- Experience with Kubernetes administration, data pipelines, and monitoring and observability tools.
- Excellent communication and problem-solving skills.
- Self-driven, highly motivated and ability to work both independently and within a team.
- Operate optimally in fast pace development environment with dynamic changes, tight deadlines and limited resources.