Overview
On Site
Full Time
Part Time
Accepts corp to corp applications
Contract - Independent
Contract - W2
Skills
Distributed Computing
Python
SQL
NoSQL
Database
IBM DB2
PostgreSQL
Snow Flake Schema
Data Modeling
Workflow
DevOps
Docker
Kubernetes
Problem Solving
Conflict Resolution
PySpark
Big Data
Extract
Transform
Load
Unstructured Data
Collaboration
Apache Spark
Caching
Data Engineering
Continuous Integration
Continuous Delivery
Version Control
Unit Testing
Cloud Computing
Amazon Web Services
Data Security
Regulatory Compliance
Mentorship
Job Details
Role Name - Lead PySpark Engineer
ROLE_DESCRIPTION -
10+ years of experience in big data and distributed computing.
Very Strong hands-on experience with PySpark, Apache Spark, and Python.
Strong Hands on experience with SQL and NoSQL databases (DB2, PostgreSQL, Snowflake, etc.).
Proficiency in data modeling and ETL workflows.
Proficiency with workflow schedulers like Airflow
Hands on experience with AWS cloud-based data platforms.
Experience in DevOps, CI/CD pipelines, and containerization (Docker, Kubernetes) is a plus.
Strong problem-solving skills and ability to lead a team
Lead the design, development, and deployment of PySpark-based big data solutions.
Architect and optimize ETL pipelines for structured and unstructured data.
Collaborate with Client, data engineers, data scientists, and business teams to understand requirements and provide scalable solutions.
Optimize Spark performance through partitioning, caching, and tuning.
Implement best practices in data engineering (CI/CD, version control, unit testing).
Work with cloud platforms like AWS
Ensure data security, governance, and compliance.
Mentor junior developers and review code for best practices and efficiency
ROLE_DESCRIPTION -
10+ years of experience in big data and distributed computing.
Very Strong hands-on experience with PySpark, Apache Spark, and Python.
Strong Hands on experience with SQL and NoSQL databases (DB2, PostgreSQL, Snowflake, etc.).
Proficiency in data modeling and ETL workflows.
Proficiency with workflow schedulers like Airflow
Hands on experience with AWS cloud-based data platforms.
Experience in DevOps, CI/CD pipelines, and containerization (Docker, Kubernetes) is a plus.
Strong problem-solving skills and ability to lead a team
Lead the design, development, and deployment of PySpark-based big data solutions.
Architect and optimize ETL pipelines for structured and unstructured data.
Collaborate with Client, data engineers, data scientists, and business teams to understand requirements and provide scalable solutions.
Optimize Spark performance through partitioning, caching, and tuning.
Implement best practices in data engineering (CI/CD, version control, unit testing).
Work with cloud platforms like AWS
Ensure data security, governance, and compliance.
Mentor junior developers and review code for best practices and efficiency
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.