Data Engineer - Python/PySpark

Hybrid in Irving, TX, US • Posted 13 hours ago • Updated 13 hours ago

Full Time

No Travel Required

Hybrid

Depends on Experience

Fitment

Dice Job Match Score™

🫥 Flibbertigibetting...

Job Details

Skills

Python
PySpark
Google Cloud Platform
Git
Extract, Transform, Load
Good Clinical Practice
GitHub
Amazon Web Services
Analytical Skill
Apache Hadoop
Apache Spark
Code Refactoring

Summary

Role: Data Engineer - Python/PySpark

Location: Irving TX (3 Days onsite/week)

Full Time

Job Description:

• Strong hands-on development experience in Python, PySpark, and SQL.

• Experience building large-scale ETL/ELT pipelines for structured and unstructured data.

• Deep understanding of Spark and distributed computing fundamentals (transformations, shuffles, optimization).

• Experience with big data frameworks such as Hadoop and Spark.

• Proficiency with Git-based repositories (Bitbucket / GitHub).

• Experience working with AWS, Azure, or Google Cloud Platform environments.

• Strong understanding of database design, data modeling, warehouse schemas (star/snowflake).

• Experience with CI/CD automation and pipeline development.

• Strong analytical and troubleshooting skills for resolving complex data issues.

• Ability to collaborate with cross-functional teams and convert business requirements into technical solutions.

• Design, develop, and maintain robust, scalable ETL/ELT pipelines.

• Write efficient, reusable, and scalable code in Python and PySpark for distributed data processing.

• Review existing data engineering code and identify opportunities for refactoring or performance improvement.

• Implement data validation, cleansing, reconciliation, and quality checks across the data lifecycle.

• Collaborate with IT and business stakeholders to understand data requirements and translate them into solutions.

• Monitor pipeline performance, troubleshoot failures, and optimize for latency, throughput, and cost.

• Participate in code reviews, enforce coding standards, and contribute to engineering best practices.

• Build and maintain CI/CD pipelines for testing, packaging, and deployment of data pipelines.

• Ensure data reliability, security, and consistency across environments.

• Work with cloud services and big data platforms to support modern data architecture.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10354711
Position Id: 8916945
Posted 13 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Hybrid in Irving, Texas

•

7d ago

Pyspark / Python /Spark /Data Engineer /Hadoop Fulltime -Local to Dallas ,TX Experience in Hadoop/big data technologies. Hands-on experience with the Hadoop eco-system (HDFS, MapReduce, Hive, Pig, Impala, Spark, Kafka, Kudu, Solr) Hands-on experience with Python/Pyspark. Experience in spark programming.

Easy Apply

Full-time

$80,000 - $100,000

Big Data Engineer/Big Data Developer

Hybrid in Dallas, Texas

•

5d ago

We are seeking a Big Data Engineer with strong Python and Apache Spark experience to design and build scalable data processing systems. The candidate will work with large datasets, develop reliable data pipelines, and collaborate with analytics and data science teams to support data-driven business decisions. Key ResponsibilitiesDesign and implement scalable data processing pipelines using Python and Apache Spark (PySpark). Develop and maintain ETL/ELT workflows to process large volumes of data

Easy Apply

Full-time

Depends on Experience

Data Engineer

Dallas, Texas

•

5d ago

Job DescriptionWe are seeking an experienced Data Engineer with strong hands-on expertise in PySpark, Spark, SQL, and Python to join our team in New Jersey. The ideal candidate will have experience designing and building scalable data pipelines, processing large datasets, and working with distributed data processing frameworks.Key ResponsibilitiesDesign, develop, and maintain scalable data pipelines using PySpark and Spark.Process and transform large volumes of structured and unstructured data.W

Easy Apply

Contract

$50 - $60

Hadoop and PySpark Lead Developer

Plano, Texas

•

15d ago

Infosys is seeking a Hadoop and PySpark Lead Developer. In this role, you will enable digital transformation for our clients in a global delivery model, research on technologies independently, recommend appropriate solutions and contribute to technology-specific best practices and standards. You will be responsible to interface with key stakeholders and apply your technical proficiency across different stages of the Software Development Life Cycle. You will be part of a learning culture, wher

Easy Apply

Full-time

$80,000 - $120,000

Search all similar jobs

Data Engineer - Python/PySpark

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs