Overview
Skills
Job Details
Hi,
Our client is looking for a Hadoop / ETL Developer and PySpark with a Full Time project Dallas, TX / Atlanta, GA / Cleveland, OH / Pittsburgh, PA. below is the detailed requirement.
Job Title : Hadoop / ETL Developer and PySpark
Location : Dallas, TX / Atlanta, GA / Cleveland, OH / Pittsburgh, PA
Duration : Full Time
Job Description:
We are seeking a highly experienced Senior Big Data & DevOps Engineer with 8+ years of professional experience in HDFS, Hive, Impala, PySpark, Python, and DevOps automation tools such as uDeploy and Jenkins. This role is responsible for managing end-to-end data operations, including HDFS table management, ETL pipeline development, multi-environment codebase governance, platform upgrades, and production support.
The ideal candidate will have strong expertise in Linux system operations, Big Data ecosystem tools, and experience with incident/change management using ServiceNow. This role plays a key part in ensuring the stability, scalability, and efficiency of enterprise data platforms while enabling seamless development-to-production workflows.
Key Responsibilities:
Big Data Platform Operations:
- Design, manage, and optimize HDFS directories, tables, and partitioning strategies.
- Implement and enforce data retention and lifecycle policies across large datasets.
- Administer Hive and Impala environments, ensuring high availability, performance tuning, and security compliance.
ETL Development & Data Engineering:
- Develop scalable ETL pipelines using PySpark, Hive, and Python.
- Build reusable frameworks for data ingestion, transformation, and aggregation.
- Optimize job performance through query tuning, resource management, and parallelization.
DevOps & Environment Management:
- Maintain and promote code across DEV, QA, UAT, and PROD environments.
- Develop and support CI/CD pipelines using Jenkins and uDeploy for automated deployments.
- Perform environment upgrades, patching, and dependency management aligned with release schedules.
Linux & Infrastructure Operations:
- Execute Linux administration tasks including performance tuning, disk management, and scripting (Bash/Python).
- Troubleshoot cluster-level issues including node failures, job errors, and distributed system anomalies.
Change & Incident Management:
- Drive incident resolution and change execution using ServiceNow workflows.
- Conduct root cause analysis (RCA) for critical issues and implement preventive solutions.
- Ensure compliance with ITIL processes for change, incident, and problem management.
Collaboration & Technical Leadership:
- Partner with data engineers, developers, DevOps teams, and business analysts to ensure operational excellence.
- Mentor junior engineers and contribute to technical leadership across the Big Data ecosystem.
- Document operational procedures, troubleshooting guides, and architectural decisions for internal knowledge sharing.
Required Qualifications:
- Bachelor's degree in Computer Science, Information Technology, or related field.
- 8+ years of experience in Big Data engineering and DevOps practices.
- Advanced proficiency in HDFS, Hive, Impala, PySpark, Python, and Linux.
- Proven experience with CI/CD tools such as Jenkins and uDeploy.
- Strong understanding of ETL development, orchestration, and performance optimization.
- Experience with ServiceNow for incident/change/problem management.
- Excellent analytical, troubleshooting, and communication skills.
Nice to Have:
- Exposure to cloud-based Big Data platforms (AWS EMR).
- Familiarity with containerization (Docker, Kubernetes) and infrastructure automation tools (Ansible, Terraform).