Requirements
As a Data Engineer, you will be responsible for collecting, parsing, managing, analyzing, and visualizing
large sets of data to turn information into actionable insights. You will work across multiple platforms to
ensure that data pipelines are scalable, repeatable, and secure, capable of serving multiple users.
Requirements:
Bachelor?s degree in Computer Science, Information Systems, or a related field, or equivalent
experience
All of these skills, or at least as many as possible:
o Databricks ? PySpark
o SQL (Starburst is a bonus)
o Gitlab
o CI/CD Pipelines
o Python
o Tableau
Two or more years of experience with tools such as Databricks, Collibra, and Starburst
Three or more years of experience with Python and PySpark
Experience using Jupyter notebooks, including coding and unit testing
Recent accomplishments working with relational and NoSQL data stores, methods, and approaches
(STAR, Dimensional Modeling)
Two or more years of experience with a modern data stack (Object stores like S3, Spark, Airflow,
Lakehouse architectures, real-time databases) and cloud data warehouses such as RedShift,
Snowflake
Overall data engineering experience across traditional ETL & Big Data, either on-prem or Cloud
Data engineering experience in AWS (any CFS2/EDS) highlighting the services/tools used
Experience building end-to-end data pipelines to ingest and process unstructured and semi-
structured data using Spark architecture
Responsibilities:
Design, develop, and maintain robust and efficient data pipelines to ingest, transform, catalog, and
deliver curated, trusted, and quality data from disparate sources into our Common Data Platform
Actively participate in Agile rituals and follow Scaled Agile processes as set forth by the CDP Program
team
Deliver high-quality data products and services following Safe Agile Practices
Proactively identify and resolve issues with data pipelines and analytical data stores
Deploy monitoring and alerting for data pipelines and data stores, implementing auto-remediation
where possible to ensure system availability and reliability
Employ a security-first, testing, and automation strategy, adhering to data engineering best
practices
Collaborate with cross-functional teams, including product management, data scientists, analysts,
and business stakeholders, to understand their data requirements and provide them with the
necessary infrastructure and tools
Keep up with the latest trends and technologies, evaluating and recommending new tools,
frameworks, and technologies to improve data engineering processes and efficiencies