Title ; Data Engineer ( Pyspark, Hadoop, Scala)
Hybrid 2-3 days onsite in Sunnyvale, CA
Must work on our W2
Interview: one Coding round.
Job Description
We are looking for a highly motivated and eager-to-learn Data Engineer with hands-on experience in PySpark, Hadoop, Scala, and ETL processes. The ideal candidate will work on large-scale data processing, specifically handling signals and datasets, transforming raw user data into structured tables, and supporting pre-machine learning workflows.
Skill Set: Can perform Pyspark, Hadoop, Scala, ETL,
Day to Day: Working on signals and table. Preparing and processing raw data from users and creating tables
Considered the premachine learning.
(would like to see someone with a master''s w/ 1-2 years ex)Very hungry to learn.
Bonus: Machine Learning Background.
Key Responsibilities:
- Process and transform large volumes of raw data using PySpark and Hadoop
- Develop and maintain ETL pipelines for data ingestion and processing
- signals and datasets, building and optimizing data tables
- Clean, validate, and prepare data for pre-machine learning use cases
- Collaborate with analytics and data science teams to support model development
- Ensure data quality, consistency, and performance optimization
Required Skills:
- Strong hands-on experience in PySpark, Hadoop, Scala
- Good understanding of ETL processes and data pipelines
- Experience working with large-scale structured and unstructured datasets
- Basic knowledge of data modeling and data transformation techniques
- Strong problem-solving and analytical skills