Overview
Remote
On Site
Hybrid
$60-$72 per hour
Contract - W2
Contract - Independent
Contract - 6-9 Month(s)
Skills
Data Science
Expect
Software Design
Machine Learning (ML)
Management
Continuous Integration
Continuous Delivery
Testing
Version Control
Documentation
Data Structure
Reporting
Extract
Transform
Load
Database
Real-time
Analytical Skill
Data Quality
Accessibility
Java
Apache Hadoop
Apache Hive
Apache Cassandra
Apache Pig
MySQL
NoSQL
Data Engineering
FOCUS
Python
PySpark
Scala
Apache Spark
Performance Tuning
Optimization
Cloud Computing
Google Cloud Platform
Google Cloud
Workflow
Orchestration
Docker
Big Data
Lifecycle Management
Communication
PPO
UPS
Legal
Insurance
Collaboration
Teamwork
Job Details
***At this time, we are unable to consider candidates requiring visa sponsorship or third-party recruitment agencies for this role. We encourage all applicants to apply directly, and we thank you for your understanding.***
Overview: As a Lead Data Engineer, you will be part of the Data Sciences team driving scalable data infrastructure and workflows that power personalization and recommendation systems across the website and the app. You will play a key role in designing, implementing, and optimizing distributed data pipelines and engineering solutions in a cloud environment. We expect you to follow best practices in software design, contribute to code reviews, maintain a robust and well-tested codebase, and produce clear, maintainable documentation.
Responsibilities:
Designing and building large-scale data pipelines using Python, PySpark, and Scala
Performance tuning and optimization of Spark jobs, especially in Google Cloud Platform environments
Developing and maintaining production-ready workflows using orchestration tools such as Kubeflow Pipelines (KFP) or Airflow
Collaborating closely with data scientists, ML engineers, and product teams to support experimentation, model development, and deployment
Managing containerized applications with Docker and integrating them into CI/CD pipelines
Ensuring code quality through testing, version control, and documentation
Develops software that processes, stores and serves data for use by others.
Develops large scale data structures and pipelines to organize, collect and standardize data that helps generate insights and addresses reporting needs.
Writes ETL (Extract / Transform / Load) processes, designs database systems and develops tools for real-time and offline analytic processing.
Ensures that data pipelines are scalable, repeatable and secure. Troubleshoots software and processes for data consistency and integrity.
Integrates data from a variety of sources, assuring that they adhere to data quality and accessibility standards. Has in-depth knowledge of large scale search applications and building high volume data pipelines.
In-depth knowledge of Java, Hadoop, Hive, Cassandra, Pig, MySQL or NoSQL or similar.
Required Qualifications:
Bachelor's degree in Computer Science, Engineering, or a related fieldor equivalent experience
5-20 years of experience in data engineering with a strong focus on Python, PySpark, and/or Scala
Expertise in Spark performance tuning and optimization in cloud-based environments (preferably Google Cloud Platform)
Hands-on experience with workflow orchestration tools like KFP or Airflow
Proficiency with Docker and container-based deployment strategies
Experience building scalable and maintainable data pipelines
Strong understanding of distributed systems, big data technologies, and data lifecycle management
Excellent communication and collaboration skills
Benefits:
York Solutions Offers a generous benefits package for eligible full-time employees:
- BCBS Medical with 3 Plans to choose from (PPO and High deductible PPO plans with Health Savings Program)
- Delta Dental plan with 2 free cleanings and insurance discounts
- Eye Med Vision with annual check-ups and discounts on lens
- Life and Accidental Death Insurance paid by company
- John Hancock 401(k) Retirement Plan with discretionary company match up to 5%
- Voluntary Insurance programs such as: Hospital Indemnity, Identity Protection, Legal Insurance, Long Term Care, and Pet Insurance.
- Flexible work environment with some remote working opportunities
- Strong fun and teamwork environment
- Learning, development, and career growth
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.