Design and Develop Data Ingestion and Processing Code using Python/Pyspark/ Language R/Hive on the Cloudera CDH Platform.
Create and update Design Specs and reference Architecture documents to enable acceleration in solution development.
Cloudera Data Platform Innovating new ideas, researching related technology, developing new concepts, prototyping and delivering implementations
Participate in testing and peer code reviews to identify any bugs and ensure reusability of code.
Automate the deployment of the solutions by using Shell scripts/Python/Oozie.
Work with internal subject matter experts to define requirements for new demo environments
Collaborating with the Apache community on Hadoop and other related open source projects
Work with IT change Management group to promote the developed code/scripts from non-production to production environments.
Work with Architecture team (Application/Security/Infrastructure/Data) to get their approval on the designed solutions.
Tools and Technology Experience that candidates should have..
Programming Languages - Python, PySpark, Java, SQL, Shell Scripting, Sqoop
Big Data Tools - Spark, HDFS, Kafka, Hive, HBase
Databases - MySQL, PostgreSQL , SQL Server, SnowFlake
Cloudera Technologies - Cloudera Data Platform & Cloudera Manager
Cloud Technologies - Amazon Web Services, AWS Big Data Platform.
OpenStack Other Software & Tools - Tableau, SAS, Docker, Kubernetes, GitHub