Role Title: Sr Big Data Specialist
Purpose of Role/Organizational Unit: This is a senior position that requires extensive knowledge of Big Data tool administration, Data management related technologies integration. This organizational unit is for data science/analytics, technology management and operational services delivery of the various elements of the big data and analytics, data integration, business intelligence management. In this capacity, this Senior Big Data Specialist role is responsible for infrastructure architecture design and their implementation for big data services using existing Enterprise Data Assets (Hadoop Data Lake, Data Warehouse and Master Data Management).
The following functions are the responsibility of the Senior Big Data Specialist within data management team. This individual will be mainly responsible to manage, configure and maintain large scale multi-tenant Cloudera Hadoop cluster environments perform the performance tuning, and code migrations from dev to QA and production environment within Data Lake, also subject matter expert for other data integration tool, management and governance, such as Informatica etc;
• Implementing, managing and providing support for large Cloudera Hadoop clusters across all environments (Dev, QA & Production)
• Subject matter expertise supporting and governing other data integration tools in the environment.
• Working with multiple teams and colleagues at every level of the organization.
• Managing large-scale infrastructure projects and working experience Red Hat Enterprise Linux 6 and 7 systems
• Cloudera administration experience working with - HDFS, YARN, ZOOKEEPER, Map Reduce, Spark, Impala, HUE, Oozie, Sqoop, Kafka, Hive and Kudu.
• Deploying & Maintaining Hadoop cluster, also De-commission & Commission of Nodes using Cloudera.
• Configuring the Name node High availability and keeping a track of all the running Hadoop jobs.
• Takes care of the day-to-day running of Hadoop clusters.
• Work closely with the infrastructure team, database team, network team, BI team and application teams to make sure that all the big data applications are highly available and performing as expected.
• Responsible for capacity planning and estimating the requirements for lowering or increasing the capacity of the Hadoop cluster.
• Responsible for deciding the size of the Hadoop cluster based on the data to be stored in HDFS.
• Performing Backup and recovery using Cloudera BDR tool.
• Enabling snapshot backups and recovery the point in time recovery using Cloudera.
• Handle all Hadoop environment builds, including design, security, capacity planning, cluster setup, performance tuning and ongoing monitoring.
• Perform high-level, day-to-day operational maintenance, support, and upgrades for the Cloudera Hadoop Cluster.
• Research and recommend innovative, and where possible, automated approaches for system administration tasks.
• Creation of key performance metrics, measuring the utilization, performance and overall health of the cluster.
• Deploy new/upgraded hardware and software releases and establish proper communication channels.
• Ability to collaborate with product managers, lead engineers and data scientists on all facets of the Hadoop Eco-System.
• Ensure existing data/information assets are secure and adhering to a best in class security model using.
• Troubleshooting application errors and ensuring that they do not occur again.
• Bachelor’s Degree in Computer Science, Information Systems, or equivalent
• Minimum of 8 years in data tools and with focus in quality and reliability in design
• Required - Cloudera Hadoop Experience, expertise in Cloudera Hadoop 5.10 above.
• Experience with Cloudera Hadoop Distribution (Hive, HBase, Spark)
• Experience with data integration and transformation software and data warehouse, master data
• Creation of complex parallel loads and dependency creation using work flows
• Must have experience in Installing and configuring CDH & CM version 5.10 above
• Experience in real-time analytics Unique Competencies
• Expertise in Scala, Java, Python, Spark, Perl, Shell programming and other Big Data Development technologies
• Preferred - Data Lake ETLs Skills and Informatica Skills
• Preferred - experience in Web Services
• Preferred - 2+ years’ experience in data lake design and implementation Educational Requirements