Data Engineer {(Hadoop Specialist (W2)}

Overview

Remote

$40 - $60

Contract - W2

Contract - Independent

Contract - 12 Month(s)

No Travel Required

Skills

Hadoop

hive

Job Details

Position: Data Engineer Hadoop
Location: Remote
Employment Type: W2

About the Role

We are looking for an experienced Data Engineer with strong Hadoop expertise to join our team. The ideal candidate will have hands-on experience working with the Hadoop ecosystem (HDFS, Hive, Spark, Sqoop, Impala, Kafka, etc.), building scalable data pipelines, and managing large datasets. You will be responsible for developing, optimizing, and maintaining big data solutions that support analytics, reporting, and business-critical decision-making.

Responsibilities

Design, build, and maintain large-scale data pipelines on Hadoop platforms.
Develop and optimize ETL workflows using Hive, Spark, Sqoop, and Python/Scala.
Perform data ingestion from multiple sources (RDBMS, APIs, streaming data) into Hadoop (HDFS, Hive, HBase).
Implement real-time data streaming solutions using Kafka/Spark Streaming.
Create and manage Hive/Impala tables (internal/external, partitioned, Parquet/ORC formats).
Conduct data profiling, validation, and quality checks to ensure data accuracy.
Optimize queries, jobs, and system performance for large distributed datasets.
Collaborate with business users, analysts, and data scientists to deliver scalable solutions.
Troubleshoot issues, perform root cause analysis, and implement fixes.
Ensure security, governance, and compliance in big data environments.

Required Skills & Qualifications

Bachelor s or Master s degree in Computer Science, Information Technology, or related field.
4 8+ years of experience as a Data Engineer, with a strong focus on the Hadoop ecosystem.
Hands-on expertise in:

HDFS, Hive, Spark, Sqoop, Impala
Python/Scala, Unix/Linux, Shell scripting
SQL and RDBMS concepts (DB2, Oracle, Teradata, SQL Server, etc.)

Strong experience in distributed/parallel processing for large datasets.
Familiarity with workflow orchestration tools (Oozie, Airflow, Jenkins).
Knowledge of Agile methodologies and tools like JIRA/Confluence.

Preferred Qualifications

Experience with real-time data processing (Kafka, Spark Streaming).
Exposure to cloud-based Hadoop deployments (AWS EMR, Azure HDInsight, Google Cloud Platform Dataproc).
Performance tuning and optimization expertise in Hadoop and Spark.
Knowledge of Snowflake, Redshift, or other modern cloud data warehouses is a plus.
Certifications in Cloudera/Hortonworks/AWS Big Data preferred.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share