Data engineer - Pyspark developer

Overview

On Site
$100,000 - $120,000
Full Time

Skills

pyspark
Scala
Python
Spark

Job Details

Job Title: PySpark with Scala Developer Location: Irving,Tx/Richardson,Tx Job Type: Full-Time Permanent Job Summary:
We are seeking an experienced and highly motivated PySpark with Scala Developer to join our Big Data engineering team. The ideal candidate will have strong experience building scalable data processing pipelines using Apache Spark, with expertise in both PySpark and Scala. This role requires strong problem-solving skills, attention to detail, and the ability to work collaboratively in a fast-paced, data-driven environment. Key Responsibilities:
Design, develop, and optimize large-scale data processing pipelines using Apache Spark, with focus on PySpark and Scala. Build and maintain reliable data ingestion, transformation, and validation workflows. Integrate data from various sources including files, databases, APIs, and streaming systems. Write efficient and reusable code for ETL and data analytics use cases. Collaborate with data scientists, data engineers, and business teams to understand requirements and deliver data solutions. Ensure performance tuning, debugging, and optimization of Spark jobs. Implement data quality, lineage, and governance practices. Develop unit tests and support automated deployment pipelines (CI/CD). Work with large datasets in both batch and real-time environments on cloud or on-premise platforms. Required Skills and Experience:
5+ years of experience in Big Data or data engineering roles. Strong hands-on experience with Apache Spark, using both PySpark and Scala. Deep understanding of Spark internals including RDDs, DataFrames, Datasets, and Spark SQL. Proficient in data modeling, data wrangling, and data transformations. Experience with Hadoop ecosystem, HDFS, Hive, or HBase. Proficiency in querying with SQL and working with structured and unstructured data. Experience working with Git, CI/CD tools, and Agile methodologies. Familiarity with cloud platforms such as AWS, Azure, or Google Cloud Platform (e.g., S3, EMR, Databricks) is a plus. Ability to write clean, efficient, and maintainable code. Strong communication and interpersonal skills.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.