Cloud Data Engineer

Overview

Remote
Hybrid
$75 - $79
Contract - W2
Contract - 12 Month(s)

Skills

Python
PySpark
Snowflake
Spark
Databricks
Amazon DynamoDB
Amazon Redshift
Amazon S3
Amazon SageMaker
Amazon Web Services
Apache Parquet
Apache Spark
Apache Thrift
Artificial Intelligence
Business Intelligence
Big Data
Apache Avro
Cloud Computing
Data Architecture
Data Engineering
Data Governance
Data Science
Data Wrangling
Database
DevOps
Engineering Support
Extract
Transform
Load
Jenkins
MySQL
PostgreSQL
SQL
Snow Flake Schema
Stacks Blockchain
Pandas
Mathematics
Media
Publishing
Real-time
Reporting

Job Details

Edit Skills

Job Title: Cloud Data Engineer

Industry: Mass Media & Entertainment

Location: New York, NY (Local Area Remote)

Duration: 13 months +

Rate: $79.30/HR

Summary: Our client, a top mass-media & entertainment company, is seeking an experienced Cloud Data Engineer with significant DevOps and SysOps experience in AWS and Databricks. The role will focus on providing engineering support for a variety of analytics, data architecture and data science endeavors. In this role you will primarily lead the design and evolution of the BI delta lake, while working closely with the rest of the BI team to build a flexible and robust data infrastructure.

You will co-develop and maintain an evolving number of data pipelines to enable advanced reporting capabilities across our Subscription, Publishing, Digital, Consumer Products, Corporate and Marketing teams. This role will function as a primary subject matter expert on creating and maintaining efficient data pipelines to clean, conform and deliver large datasets, both structured and unstructured, to downstream destinations.

As the team s ETL expert, there will be ample opportunities to conduct complex data wrangling/munging operations using a variety of AWS and other industry standards tools including, Python, Pandas, PySpark, Glue Studio, Lambdas, Advanced SQL, etc. You will also work with APIs, create complex materialized views and scheduled queries, as well as develop error logging routines and associated alerts. You can expect to work closely with the Enterprise Architect and Data Architect to design and deploy best practice data governance operations as well.

Our team is small but capable and looking for an intellectually curious engineer with excellent communication skills, who is interested in working very closely to co-invent solutions to complex problems. Our department is also an environment that encourages constantly learning and cross-functional growth, so you can expect to explore and expand your knowledge in areas like supervised and unsupervised machine learning, graph databases, AI and applied statistics/mathematics.

Responsibilities: Develop batch and real-time data pipelines, and lead the integration of many 1st, 2nd, and 3rd party data sources, while working closely with other engineering services such as the personalization and testing teams. Create data catalogs and validation routines to ensure quality and correctness of key operational datasets and metrics in real time. Build integrations with organic and paid media platforms to effectively deliver data to support the optimization of various KPIs. Partner with Data Architect to build data infrastructure that enables activation, attribution and segmentation capabilities across growth, retention and marketing objectives Collaborate with lifecycle and product marketing teams to democratize insights that will drive subscriber engagement using data driven solutions. Coach other engineers and BI team members on best practices and technical concepts of building large scale, robust and well governed data platforms.

Basic Qualifications: Excellent communicator and collaborator, able to apply technical acumen to drive business outcomes. A natural team player with a willingness and desire to engage in cross training in a small team environment. 4+ years in big data and/or data intensive projects in industry or academic/research settings 4+ years of deep experience developing in Python. Expert Level SQL developer with an emphasis on Redshift but capable across multiple other transactional databases such as PostgreSQL, MySQL, etc. Significant experience developing with PySpark. Experience engineering big-data solutions using technologies like Redshift, Spark, S3, DynamoDB, AWS SageMaker etc. Demonstrated understanding of data engineering tools and practices, including platforms like Databricks, Snowflake, and Jenkins. Experience with deploying and running AWS-based data solutions and comfortable deploying tools such as DynamoDB, Athena and Lambda. Demonstrated experience applying Master Data Management including metadata management, data lineage, and the principles of data governance Ability to deliver technical solutions in the face of challenging and evolving data conditions. Preferred Qualifications: Experience implementing marketing technology stacks including real-time messaging and attribution pipelines. Experience leveraging CDPs (mParticle, Hightouch etc.) to create deterministic user profiles that can be leveraged across a variety of applications. Experience integrating with ML platforms and experimentation frameworks. Familiarity with front-end development frameworks and experience in full stack development a plus. Familiarity with binary data serialization formats such as Parquet, Avro, and Thrift.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.