Lead Big Data/Data Processing/Pyspark

Overview

On Site

$45 - $50

Contract - W2

Contract - Independent

Contract - 12 Month(s)

Skills

Amazon S3

Amazon Web Services

Analytical Skill

Big Data

Cloud Computing

Collaboration

Data Flow

Data Modeling

Data Processing

Data Quality

Database

Databricks

Documentation

ELT

Electronic Health Record (EHR)

Extract

Transform

Load

Good Clinical Practice

Google Cloud Platform

Innovation

Machine Learning (ML)

Management

Microsoft Azure

Performance Tuning

PySpark

Quality Assurance

Reporting

Snow Flake Schema

Star Schema

Streaming

Testing

Job Details

Employment : W2

Minimum years of experience: 8+ Years of experience

Job Description:

Data Pipeline Development: Design, develop, test, and deploy robust and scalable data pipelines using PySpark for data ingestion, transformation, and loading (ETL/ELT) from various sources (e.g., S3, ADLS, databases, APIs, streaming data).
Big Data Processing: Utilize PySpark to process large datasets efficiently, handling complex data transformations, aggregations, and data quality checks.
Performance Optimization: Optimize PySpark jobs for performance, efficiency, and cost-effectiveness, identifying and resolving bottlenecks.
Data Modeling: Collaborate with data architects and analysts to design and implement efficient data models (e.g., star schema, snowflake schema, data vault) for analytical and reporting purposes.
Cloud Integration: Work with cloud platforms (AWS, Azure, Google Cloud Platform) and their respective big data services (e.g., AWS EMR, Azure Databricks, strong understanding of medallion, Google Cloud Platform Dataflow/Dataproc) to deploy and manage PySpark applications.
Collaboration: Work closely with data scientists, machine learning engineers, and other stakeholders to understand data requirements and deliver solutions that meet business needs.
Testing and Quality Assurance: Implement comprehensive unit, integration, and end-to-end tests for data pipelines to ensure data accuracy and reliability.
Monitoring and Support: Monitor production data pipelines, troubleshoot issues, and provide ongoing support to ensure data availability and integrity.
Documentation: Create and maintain clear and concise documentation for data pipelines, data models, and processes.
Innovation: Stay up-to-date with the latest advancements in big data technologies, PySpark, and cloud services, and recommend new tools and approaches.

Additional Skills:

Data Pipeline Development Design, develop, test, and deploy robust and scalable data pipelines using PySpark for data ingestion, transformation, and loading ETLELT from various sources e.g., S3, ADLS, databases, APIs, streaming data.
Big Data Processing Utilize PySpark to process large datasets efficiently, handling complex data transformations, aggregations, and data quality checks.
Performance Optimization Optimize PySpark jobs for performance, efficiency, and cost-effectiveness, identifying

Thanks & Regards

B. Koushik

Talent Acquisition

Direct: +1

Phone: +1 ext: 229

I can be reached between 9:00 am EST - 5:30 pm EST

Note: This is not unsolicited mail. If you are not interested in receiving our e-mails then please reply with "Remove" in the subject line.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share