Senior Big Data ETL Engineer (Local to OH only)

Overview

On Site

Depends on Experience

Accepts corp to corp applications

Contract - Independent

Contract - W2

Contract - 12 Month(s)

Skills

Big Data

ETL

Hadoop

Hive

PySpark

UNIX shell

Job Details

Senior Big Data ETL Engineer On-site- 5 days a week

50 W. Town Street, Columbus, Ohio 43215

Work Hours M-F 8:00 AM to 5:00 PM EST

Interviews via Teams

Technical Specialist 3 (TS3)

Responsibilities:

Participate in Team activities, Design discussions, stand-up meetings, and planning reviews with the team.
Perform data analysis, data profiling, data quality, and data ingestion in various layers using big data/Hadoop/Hive/Impala queries, PySpark programs, and UNIX shell scripts.
Follow the organization coding standard document, create mappings, sessions, and workflows as per the mapping specification document.
Perform Gap and impact analysis of ETL and IOP jobs for the new requirements and enhancements.
Create jobs in Hadoop using SQOOP, PYSPARK, and Stream Sets to meet the business user s needs.
Create mockup data, perform Unit testing, and capture the result sets against the jobs developed in the lower environment.
Updating the production support Run book, Control M schedule document as per the production release.
Create and update design documents and provide a detailed description of workflows after every production release.
Continuously monitor the production data loads, fix the issues, update the tracker document with the issues, and identify the performance issues.
Performance tuning long-running ETL/ELT jobs by creating partitions, enabling full load, and other standard approaches.
Perform Quality assurance checks, Reconciliation post data loads, and communicate to the vendor for receiving fixed data.
Participate in ETL/ELT code review and design re-usable frameworks.
Create Remedy/Service Now tickets to fix production issues, create Support Requests to deploy Database, Hadoop, Hive, Impala, UNIX, ETL/ELT, and SAS code to the UAT environment.
Create Remedy/Service Now tickets and/or incidents to trigger Control M jobs for FTP and ETL/ELT jobs on an ad hoc, daily, weekly, Monthly, and quarterly basis as needed.
Model and create STAGE / ODS /Data Warehouse Hive and Impala tables as and when needed.
Create Change requests, work plan, Test results, BCAB checklist documents for the code deployment to the production environment and perform the code validation post-deployment.
Work with Hadoop Admin, ETL, and SAS admin teams for code deployments and health checks.
Create re-usable UNIX shell scripts for file archival, file validations, and Hadoop workflow looping.
Create a re-usable framework for Audit Balance Control to capture Reconciliation, mapping parameters and variables, serving as a single point of reference for workflows.
Create PySpark programs to ingest historical and incremental data.
Create SQOOP scripts to ingest historical data from the EDW Oracle database to Hadoop IOP, create HIVE tables and Impala views creation scripts for Dimension tables.
Participate in meetings to continuously upgrade the Functional and technical expertise.

Required Skill:

8+ years of experience with Big Data, Hadoop on Data Warehousing or Data Integration projects.
Analysis, Design, development, support and Enhancements of ETL/ELT in data warehouse environment with Cloudera Bigdata Technologies (with a minimum of 7 or more years experience in Hadoop, MapReduce, Sqoop, PySpark, Spark, HDFS, Hive, Impala, StreamSets, Kudu, Oozie, Hue, Kafka, Yarn, Python, Flume, Zookeeper, Sentry, Cloudera Navigator) along with Oracle SQL/PL-SQL, Unix commands and shell scripting;
Strong development experience (minimum of 7 or more years) in creating Sqoop scripts, PySpark programs, HDFS commands, HDFS file formats (Parquet, Avro, ORC, etc.), StreamSets pipeline creation, job scheduling, Hive/Impala queries, Unix commands, scripting, and shell scripting etc.
Writing Hadoop/Hive/Impala scripts (minimum of 7 or more years experience) for gathering stats on tables post data loads.
Strong SQL experience (Oracle and Hadoop (Hive/Impala, etc.)).
Writing complex SQL queries and performing tuning based on the Hadoop/Hive/Impala explain plan results.
Proven ability to write high-quality code.
Experience building datasets and familiarity with PHI and PII data.
Expertise in implementing complex ETL/ELT logic.
Develop and enforce a strong reconciliation process.
Accountable for ETL/ELT design documentation.
I have good knowledge of Big Data, Hadoop, Hive, Impala database, data security, and dimensional model design.
Basic knowledge of UNIX/LINUX shell scripting.
Utilize ETL/ELT standards and practices towards establishing and following a centralized metadata repository.
Good experience in working with Visio, Excel, PowerPoint, Word, etc.
Effective communication, presentation, and organizational skills.
Familiar with Project Management methodologies like Waterfall and Agile
Ability to establish priorities and follow through on projects, paying close attention to detail with minimal supervision.

Desired Skill:

Demonstrate effective leadership, analytical, and problem-solving skills
Required excellent written and oral communication skills with technical and business teams.
Ability to work independently, as well as part of a team.
Stay abreast of current technologies around IT assigned.
Establish facts and draw valid conclusions.
Recognize patterns and opportunities for improvement throughout the entire organization.
Ability to discern critical from minor problems and innovate new solutions.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share