PySpark Data Engineer

Rutherford, NJ, US • Posted 11 hours ago • Updated 11 hours ago
Contract W2
On-site
Fitment

Dice Job Match Score™

🔗 Matching skills to job...

Job Details

Skills

  • Banking
  • Brand
  • Financial Services
  • Computer Science
  • Information Technology
  • Streaming
  • API
  • Dimensional Modeling
  • SQL
  • Apache Hadoop
  • HDFS
  • Apache Hive
  • Cloud Computing
  • Amazon S3
  • Google Cloud Platform
  • Google Cloud
  • Storage
  • Electronic Health Record (EHR)
  • Databricks
  • Version Control
  • Git
  • Conflict Resolution
  • Problem Solving
  • Analytical Skill
  • Communication
  • Workflow
  • Orchestration
  • Apache Airflow
  • Microsoft Azure
  • Amazon Web Services
  • Step-Functions
  • Apache Kafka
  • Amazon Kinesis
  • NoSQL
  • Database
  • MongoDB
  • Apache Cassandra
  • Amazon DynamoDB
  • Continuous Integration
  • Continuous Delivery
  • Extract
  • Transform
  • Load
  • Data Warehouse
  • Apache Spark
  • Python
  • Data Quality
  • Management
  • Data Governance
  • Regulatory Compliance
  • Data Processing
  • Data Engineering
  • PySpark
  • Big Data

Summary

Grow your career as an PySpark Data Engineer with an innovative global bank in Rutherford, NJ. Contract role with strong possibility of extension. Will require working a hybrid schedule 3 days onsite per week.

Join one of the world's most renowned global banks and trusted brand with over 200 years of continuously evolving financial services worldwide. You will work alongside some of the smartest minds in the industry who are excited to share their knowledge and to learn from you.

Contract Duration: 12 Months

Required Skills & Experience
  • Bachelor's degree in Computer Science, Engineering, Information Technology, or a related quantitative field.
  • 7-10 years of experience as a Data Engineer, with significant experience specifically in PySpark.
  • Strong proficiency in Python programming.
  • Extensive experience with Apache Spark, including Spark SQL, Spark Streaming, and DataFrame API.
  • Solid understanding of data warehousing concepts, dimensional modeling, and ETL principles.
  • Proficiency in SQL for data querying and manipulation.
  • Experience with big data technologies such as Hadoop, HDFS, Hive, or similar.
  • Familiarity with cloud platforms (e.g., AWS, Azure, Google Cloud Platform) and their data services (e.g., S3, ADLS, Google Cloud Storage, EMR, Databricks, Glue).
  • Experience with version control systems (e.g., Git).
  • Excellent problem-solving, analytical, and communication skills.
Desired Skills
  • Master's degree in a related field.
  • Experience with workflow orchestration tools (e.g., Apache Airflow, Azure Data Factory, AWS Step Functions).
  • Knowledge of stream processing technologies (e.g., Kafka, Kinesis).
  • Experience with NoSQL databases (e.g., MongoDB, Cassandra, DynamoDB).
  • Familiarity with data governance tools and practices.
  • Experience in a CI/CD environment.
What You Will Be Doing
  • Design, build, and optimize data pipelines using PySpark to extract, transform, and load (ETL) data from various sources into data lakes and data warehouses.
  • Develop and maintain scalable data processing jobs and frameworks using Apache Spark with Python (PySpark).
  • Work closely with data scientists, analysts, and business stakeholders to understand data requirements and deliver high-quality data solutions.
  • Implement data quality checks, monitoring, and alerting for data pipelines to ensure data accuracy and reliability.
  • Optimize existing PySpark jobs for performance, efficiency, and cost-effectiveness.
  • Manage and process large datasets, ensuring data governance, security, and compliance.
  • Troubleshoot and resolve issues in data pipelines and data processing jobs.
  • Participate in code reviews, contribute to architectural discussions, and promote best practices in data engineering.
  • Stay informed about new PySpark features, big data technologies, and industry best practices.
  • Document data pipelines, data models, and processes.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10105282
  • Position Id: 881610
  • Posted 11 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Hybrid in Rutherford, New Jersey

Today

Easy Apply

Full-time

USD 65.00 per hour

Remote or Jersey City, New Jersey

Today

Full-time

USD 70.00 - 77.00 per hour

New York, New York

Today

Full-time

USD 150,000.00 - 200,000.00 per year

Hybrid in New York, New York

19d ago

Easy Apply

Contract

Depends on Experience

Search all similar jobs