1. Kubernetes (EKS) in AWS Environments
- Maintaining and operating production clusters, diagnosing scaling, networking, and workload stability issues
2. AWS Cloud infrastructure
3. Big Data Engineering exposure
4. Python, SQL, Scala
5. CI/CD
Responsibilities:
• Design, develop, and maintain large-scale data processing pipelines using Big Data technologies (e.g., Hadoop, Spark, Python, Scala).
• Architect and deploy containerized big data workloads on Amazon EMR on EKS (Elastic Kubernetes Service).
• Design and implement Kubernetes-based infrastructure for running Spark applications at scale.
• Implement data ingestion, storage, transformation, and analysis solutions that are scalable, efficient, and reliable.
• Stay current with industry trends and emerging Big Data technologies to continuously improve the data architecture.
• Collaborate with cross-functional teams to understand business requirements and translate them into technical solutions.
• Optimize and enhance existing data pipelines for performance, scalability, and reliability.
• Develop automated testing frameworks and implement continuous testing for data quality assurance.
• Conduct unit, integration, and system testing to ensure the robustness and accuracy of data pipelines.
• Work with data scientists and analysts to support data-driven decision-making across the organization.
• Ability to write and maintain automated unit, integration, and end-to-end tests.
• Monitor and troubleshoot data pipelines in production environments to identify and resolve issues.
• Manage Kubernetes clusters, pods, services, and deployments for big data workloads.