Overview
On Site
Depends on Experience
Contract - Independent
Contract - W2
Contract - 12 Month(s)
No Travel Required
Skills
Cloud Computing
Apache Sqoop
Apache Spark
Conflict Resolution
CA Workload Automation AE
Apache Hadoop
Communication
Extract
Transform
Load
Good Clinical Practice
Data Governance
Data Quality
Job Details
Job Summary:
We are looking for a skilled and motivated Hadoop Data Lake Automation Engineer with 8 years of experience in automating data workflows and processes within Hadoop-based data lake environments. The ideal candidate will be responsible for building scalable automation solutions, optimizing data pipelines, and ensuring efficient data movement and transformation across platforms. This is an onsite role requiring flexibility to work from either Dallas, Texas or Charlotte, North Carolina based on project needs.
Key Responsibilities:
- Design and implement automation solutions for data ingestion, transformation, and processing in Hadoop data lake environments.
- Develop and maintain scalable data pipelines using tools such as Apache NiFi, Spark, Hive, and Sqoop.
- Collaborate with data engineers, analysts, and business stakeholders to understand data requirements and deliver automation solutions.
- Monitor and troubleshoot data workflows, ensuring reliability and performance.
- Implement best practices for data governance, security, and metadata management.
- Maintain documentation for data flows, automation scripts, and operational procedures.
- Support production environments and participate in on-call rotations as needed.
Required Skills & Qualifications:
- 8 years of hands-on experience in Hadoop ecosystem (HDFS, Hive, Spark, Sqoop, Oozie, etc.).
- Strong experience in automating data lake workflows and ETL processes.
- Proficiency in scripting languages such as Python, Shell, or Scala.
- Experience with scheduling and orchestration tools (e.g., Apache Airflow, Control-M, AutoSys).
- Solid understanding of data modelling, data quality, and performance optimization.
- Familiarity with cloud platforms (AWS, Azure, Google Cloud Platform) and big data services.
- Excellent problem-solving and communication skills.
- Preferred Qualifications:
- Experience with Apache NiFi or similar data flow tools.
- Exposure to CI/CD pipelines and DevOps practices.
- Knowledge of data cataloguing and lineage tools.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.