Please note I have direct access to the Hiring Director of Data Analytics on this position.
The global leader in its B2C industry niche is looking for a Sr. Big Data Engineer. The company has been in business for 18+ years, is global, extremely profitable, was recently purchased by a $10+ Billion publicly traded company, has 650+ employees, has been realizing 10-15% yearly growth for several consecutive years and was voted 2016 Glassdoor’s Best Places to Work.
The Sr. Big Data Engineer will be 1 of 2 Data Engineers responsible for designing and building out a brand-new scalable Data Analytics Infrastructure from the ground up. More specifically, the Data Engineer will be designing and developing high-performance data pipeline for search and recommendation engines to be utilized for Data Science and Analysis. The data pipeline will work with the new Event Driven Architecture (EDA) to accumulate and store events and promote real time responsiveness. The new data pipeline will be developed in Python, Scala or Java utilizing Apache’s Spark engine v2.1.2. Data (400 TB’s) is integrated and extracted from elastic, scalable fault tolerant Apache Kafka Streams to provide a unified, high-throughput, low-latency platform for handling real-time data feeds and will populate data lakes and data sinks on Amazon’s S3 storage service and Amazon’s Redshift data warehouse.
As data volumes grow, machine learning approaches will be implemented. The software will be trained to identify and act upon triggers within well-understood data sets before applying the same solutions to new and unknown data. The Data Engineer will utilize Spark’s ability to store data in memory and rapidly run repeated queries for training machine learning algorithms. The Data Engineer will run queries again and again, at scale, to significantly reduce the time required to go through a set of possible solutions in order to find the most efficient algorithms.
This is an outstanding opportunity for a Data Engineer to design and build out a brand-new Data Analytics Infrastructure with a company that embraces and utilizes bleeding edge technologies. The Data Engineer will be working on highly visible projects and the position will have a direct impact on company performance. Also, the Data Engineer will be exposed to machine learning innovations and models. The Data Engineer will advance their skill set in an environment which promotes sharing, collaboration, growth, and professionalism. The Hiring Director of Data Analytics, VP of Data and the company have an established reputation for allowing Engineers to progress their careers at aggressive paces and to be exposed to new, complex and bleeding edge technologies.
The company offers full benefits (PPO & HMO) including dental and vision, matching 401K, 3 weeks of vacation, 8 paid sick/personal days, flexible time off, gym membership subsidization, Short and Long-Term Disability, Life Insurance, Employee Assistance Program, Wellness Programs, extremely casual dress and flexible work hours that all start upon employment.
MUST HAVE experience writing Big Data Pipelines in Python, Scala or Java utilizing the Apache Spark Engine (or Developing MapReduce jobs for Hadoop in Python or Java)
Experience with any of the following are only a Plus (NOT MANDATORY):
Any experience with Spark for in memory performance
Any experience developing around Event Driven Architectures
Spark Streaming module
Spark SQL module (DataFrames and Datasets)
Spark GraphX module
Apache Kafka (kafka-spark-consumer package)
Apache Kafka Stream (or Samza)
AWS S3 and/or Redshift (spark-redshift)
PySpark’s Module and/or Machine Learning library
Any knowledge of statistics and experience using statistical packages for analyzing large datasets (i.e. SAS, R. etc.)
Machine Learning Models
BS and/or MS in CS, Engineering, Math, Physics or equivalent is a plus
Hermosa Beach, CA, 90254