Data Engineer

Overview

On Site
Full Time

Skills

Data Warehouse
Agile
Data Loading
Data Lake
Data Processing
Workflow
Scalability
Data Quality
Documentation
Data-flow Diagrams
Management
Computer Science
Data Science
Software Engineering
Information Systems
Python
Databricks
Data Structure
Data Storage
Change Data Capture
Star Schema
Dimensional Modeling
Writing
SQL
PL/SQL
Relational Databases
Oracle
NoSQL
MongoDB
Cosmos-Db
Cloud Computing
Continuous Integration
Continuous Delivery
Git
DevOps
Storage
Apache Parquet
Apache Avro
Supervision
Positive Attitude
Collaboration
Educate
Problem Solving
Conflict Resolution
Debugging
Extract
Transform
Load
ELT
Apache Spark
Apache Kafka
Microsoft Azure
ADF
R
Java
Scala
Database
Redis
Elasticsearch

Job Details

Benefits Summary
  • Flexible and hybrid work arrangements
  • Paid time off/Paid company holidays
  • Medical plan options/prescription drug plan
  • Dental plan/vision plan options
  • Flexible spending and health savings accounts
  • 401(k) retirement savings plan with a Roth savings option and company matching contributions
  • Educational assistance program


Overview

The Data Engineer will build, orchestrate and maintain data pipelines to populate our future single version of truth data warehouse. They will discuss data requirements with other teams in an agile environment and will have the ability to develop proof-of-concepts to evaluate new technologies and suggest improved and better ways to re-architect existing data loading processes and data structures.

Responsibilities
  • Assist with leading the team's transition to the Databricks platform and utilize the newer features of Delta Live Tables, Workflows etc
  • Design and develop data pipelines that extract data from Oracle, load it into the data lake, transform it into the desired format, and load it into Databricks data lakehouse
  • Optimize data pipelines and data processing workflows for performance, scalability, and efficiency
  • Implement data quality checks and validations within data pipelines to ensure the accuracy, consistency, and completeness of data
  • Help create and maintain documentation for data mappings, data definitions, architecture and data flow diagrams
  • Build proof-of-concepts to determine viability of possible new processes and technologies
  • Deploy and manage code in non-prod and prod environments
  • Investigate and troubleshoot data related issues and fix or provide solutions to fix defects
  • Identify and resolve performance bottlenecks, which could include suggesting ways to optimize and performance tune databases and queries to enhance query performance


Qualifications
  • Bachelor's Degree in Computer Science, Data Science, Software Engineering, Information Systems, or related quantitative field
  • 4 plus years of experience working as a Data Engineer, ETL Engineer, Data/ETL Architect or similar roles
  • Must hold a current/active Databricks Data Engineer/Analyst certification


Skills
4+ years of solid continuous experience in Python
3+ years working with Databricks with knowledge and expertise of data structures, data storage and change data capture gained from prior production implementations of data pipelines, optimizations, and best practices
3+ years of experience in Kimball dimensional modeling (star-schema comprising of facts, type1 and type2 dimensions, aggregates, etc.) with solid understanding of ELT/ETL
3+ years of solid experience writing SQL and PL/SQL code
2+ years of experience with Airflow
3+ years of experience working with relational databases (Oracle preferred)
2+ years of experience working with NoSQL databases: MongoDB, Cosmos DB, DocumentDB or similar
2+ years of cloud experience (Azure preferred)
Experience with CI/CD utilizing git/Azure DevOps
Experience with storage formats including Parquet/Arrow/Avro
Effectively collaborate with team members while being able to work independently with minimal supervision
Must have a creative mindset, knack to solve complex problems, passion to work with data, and a positive attitude
Ability to collaborate within and across teams of different technical knowledge to support delivery and educate end users on data products
Expert problem-solving skills, including debugging skills, allowing the determination of sources of issues in unfamiliar code or systems

Pluses, but not required: Any work experience in the following:
ETL / ELT tools: Spark, Kafka, Azure Data Factory (ADF)
Languages: R, Java, Scala
Databases: Redis, Elasticsearch
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.