Data Engineer

Overview

On Site

Full Time

Skills

Data Warehouse

Management

KPI

Finance

Dashboard

User Experience

Data Governance

Data Deduplication

Access Control

Encryption

Data Security

Regulatory Compliance

Meta-data Management

Workflow

Documentation

Data Architecture

ELT

Extract

Transform

Load

SQL

Data Modeling

Cloud Storage

Performance Tuning

Clustering

Optimization

Streaming

Apache Beam

Apache Flink

Apache Kafka

Analytics

Reporting

Collaboration

Apache Airflow

Orchestration

Machine Learning (ML)

Vertex

Artificial Intelligence

Jupyter

Pandas

Continuous Integration

Continuous Delivery

Testing

Python

Golang

Java

Kotlin

Database Administration

Database

NoSQL

Good Clinical Practice

Google Cloud Platform

Real-time

Data Processing

Business Intelligence

Tableau

Advanced Analytics

Data Flow

Cloud Computing

Data Engineering

Job Details

Job Description

About the Role

We are seeking a Data Engineer to design, implement, and optimize a modern cloud-based data platform using Google BigQuery and GCP-native tools. This role will be responsible for transforming data into high-quality, structured datasets to enable self-service analytics in Tableau and other BI tools.

You will ensure that our BigQuery data warehouse is scalable, cost-efficient, and aligned with business intelligence needs.

Key Responsibilities

BigQuery Data Warehouse Management and Operations

Design and implement scalable data pipelines using GCP-native tools
Develop real-time and batch data pipelines using Dataflow, Apache Beam, and BigQuery Pub/Sub for streaming and structured data ingestion.
Optimize performance with BigQuery partitioning, clustering, materialized views, and optimized SQL transformations.
Automate and schedule workflows with tools like dbt/Dataform, Airflow/Composer, and/or Cloud Workflows.
Define and manage fact tables (transactions, events, KPIs) and dimension tables (customers, providers, hospitals, products, locations).

Streaming & Real-Time Analytics

Develop streaming ingestion pipelines using Dataflow (Apache Beam), Pub/Sub.
Enable event-driven transformations for real-time data processing.
Performance optimizations for real-time dashboards in Tableau, Looker, or Data Studio, for both our compute [ financial ] costs and dashboard-user experience.

Data Governance, Quality & Security

Implement schema validation, deduplication, anomaly detection, and reconciliation across multiple sources.
Define access controls, row-level security (RLS), and column-level encryption to ensure data protection, compliance.
Maintain data lineage and metadata tracking using tools like OpenLineage, Dataplex Catalog.

Optimize & Automate Data Pipelines

Develop incremental data refresh strategies to optimize cost and performance.
Automate data transformation workflows with dbt, Dataform, Cloud Composer (Apache Airflow), and Python.
Monitor pipeline performance and cloud cost efficiency with Cloud Logging, Monitoring, and BigQuery BI Engine.

Enable Self-Service BI & Analytics

Ensure that tables and views are structured for fast and efficient queries in Tableau, Looker, and self-service BI tools.
Work with data analysts to optimize SQL queries, views, and datasets for reporting.
Provide data documentation and best practices to business teams for efficient self-service analytics.
Collaborate with data producers to ensure data is well understood at product time, and ahead of ingest.
Curate and maintain data dictionaries, data catalog, so users can understand what they are accessing.

Required Qualifications

Experience in Data Architecture & Engineering

2+ years of experience in analytics/data engineering, cloud data architecture, or ELT development.
Strong hands-on experience with SQL, and cloud-based data processing.
Hands-on Development experience with Python [ or other programming language(s) ].

Expertise in GCP & BigQuery Data Processing

Deep understanding of ELT/ETL principles
Proficiency in dbt, Dataform, or SQL-based transformation tools for data modeling.
Experience with GCP services: BigQuery, Dataflow (Apache Beam), Pub/Sub, Cloud Storage, and Cloud Functions.

BigQuery Optimization & Performance Tuning

Experience optimizing BigQuery partitioning, clustering, materialized views, and query performance.
Expertise in cost-efficient query design and workload optimization strategies.

Experience in Streaming & Real-Time Processing

Hands-on experience with streaming data pipelines using Dataflow (Apache Beam), Apache Flink, Pub/Sub, or Kafka.
Familiarity with real-time data transformations and event-driven architectures.

Experience Supporting BI & Analytics

Strong knowledge of Tableau, Looker, and BI tools, ensuring reporting is optimized
Ability to collaborate with data analysts and business teams to define data models and metrics.

Bonus Skills (Preferred but Not Required)

Knowledge of Cloud Composer (Apache Airflow) for data orchestration.
Familiarity with AI/ML model deployment and machine learning pipelines in GCP Vertex AI, Jupyter Notebooks, Pandas, etc.
Understanding of and experience with development/deployment patterns and dependency managementt, CI/CD, Testing, CodeQuality, Devcontainers or nixpkgs, poetry/uv
Programming abilities beyond python: Golang and/or Java/Kotlin/JVM
Database Administration, experience with varied database systems [ NoSQL, graph, etc ].

Why Join Us?

Work on a next-generation data platform built on Google BigQuery and GCP-native tools.
Drive real-time data processing and self-service BI enablement in Tableau, Looker, and advanced analytics.
Work with modern cloud-based technologies such as BigQuery, dbt, Dataflow, and Cloud Functions.
Fully remote opportunity with a high-impact data engineering role.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Job Description

Share