Overview
Remote
$40 - $60
Contract - W2
Contract - Independent
Contract - 12 Month(s)
No Travel Required
Skills
Hadoop
hive
Job Details
Position: Data Engineer Hadoop
Location: Remote
Employment Type: W2
About the Role
We are looking for an experienced Data Engineer with strong Hadoop expertise to join our team. The ideal candidate will have hands-on experience working with the Hadoop ecosystem (HDFS, Hive, Spark, Sqoop, Impala, Kafka, etc.), building scalable data pipelines, and managing large datasets. You will be responsible for developing, optimizing, and maintaining big data solutions that support analytics, reporting, and business-critical decision-making.
Responsibilities
- Design, build, and maintain large-scale data pipelines on Hadoop platforms.
- Develop and optimize ETL workflows using Hive, Spark, Sqoop, and Python/Scala.
- Perform data ingestion from multiple sources (RDBMS, APIs, streaming data) into Hadoop (HDFS, Hive, HBase).
- Implement real-time data streaming solutions using Kafka/Spark Streaming.
- Create and manage Hive/Impala tables (internal/external, partitioned, Parquet/ORC formats).
- Conduct data profiling, validation, and quality checks to ensure data accuracy.
- Optimize queries, jobs, and system performance for large distributed datasets.
- Collaborate with business users, analysts, and data scientists to deliver scalable solutions.
- Troubleshoot issues, perform root cause analysis, and implement fixes.
- Ensure security, governance, and compliance in big data environments.
Required Skills & Qualifications
- Bachelor s or Master s degree in Computer Science, Information Technology, or related field.
- 4 8+ years of experience as a Data Engineer, with a strong focus on the Hadoop ecosystem.
- Hands-on expertise in:
- HDFS, Hive, Spark, Sqoop, Impala
- Python/Scala, Unix/Linux, Shell scripting
- SQL and RDBMS concepts (DB2, Oracle, Teradata, SQL Server, etc.)
- Strong experience in distributed/parallel processing for large datasets.
- Familiarity with workflow orchestration tools (Oozie, Airflow, Jenkins).
- Knowledge of Agile methodologies and tools like JIRA/Confluence.
Preferred Qualifications
- Experience with real-time data processing (Kafka, Spark Streaming).
- Exposure to cloud-based Hadoop deployments (AWS EMR, Azure HDInsight, Google Cloud Platform Dataproc).
- Performance tuning and optimization expertise in Hadoop and Spark.
- Knowledge of Snowflake, Redshift, or other modern cloud data warehouses is a plus.
- Certifications in Cloudera/Hortonworks/AWS Big Data preferred.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.