Overview
Skills
Job Details
Job Description
About the Role
We are seeking a Data Engineer to design, implement, and optimize a modern cloud-based data platform using Google BigQuery and GCP-native tools. This role will be responsible for transforming data into high-quality, structured datasets to enable self-service analytics in Tableau and other BI tools.
You will ensure that our BigQuery data warehouse is scalable, cost-efficient, and aligned with business intelligence needs.
Key Responsibilities
BigQuery Data Warehouse Management and Operations
- Design and implement scalable data pipelines using GCP-native tools
- Develop real-time and batch data pipelines using Dataflow, Apache Beam, and BigQuery Pub/Sub for streaming and structured data ingestion.
- Optimize performance with BigQuery partitioning, clustering, materialized views, and optimized SQL transformations.
- Automate and schedule workflows with tools like dbt/Dataform, Airflow/Composer, and/or Cloud Workflows.
- Define and manage fact tables (transactions, events, KPIs) and dimension tables (customers, providers, hospitals, products, locations).
Streaming & Real-Time Analytics
- Develop streaming ingestion pipelines using Dataflow (Apache Beam), Pub/Sub.
- Enable event-driven transformations for real-time data processing.
- Performance optimizations for real-time dashboards in Tableau, Looker, or Data Studio, for both our compute [ financial ] costs and dashboard-user experience.
Data Governance, Quality & Security
- Implement schema validation, deduplication, anomaly detection, and reconciliation across multiple sources.
- Define access controls, row-level security (RLS), and column-level encryption to ensure data protection, compliance.
- Maintain data lineage and metadata tracking using tools like OpenLineage, Dataplex Catalog.
Optimize & Automate Data Pipelines
- Develop incremental data refresh strategies to optimize cost and performance.
- Automate data transformation workflows with dbt, Dataform, Cloud Composer (Apache Airflow), and Python.
- Monitor pipeline performance and cloud cost efficiency with Cloud Logging, Monitoring, and BigQuery BI Engine.
Enable Self-Service BI & Analytics
- Ensure that tables and views are structured for fast and efficient queries in Tableau, Looker, and self-service BI tools.
- Work with data analysts to optimize SQL queries, views, and datasets for reporting.
- Provide data documentation and best practices to business teams for efficient self-service analytics.
- Collaborate with data producers to ensure data is well understood at product time, and ahead of ingest.
- Curate and maintain data dictionaries, data catalog, so users can understand what they are accessing.
Required Qualifications
Experience in Data Architecture & Engineering
- 2+ years of experience in analytics/data engineering, cloud data architecture, or ELT development.
- Strong hands-on experience with SQL, and cloud-based data processing.
- Hands-on Development experience with Python [ or other programming language(s) ].
Expertise in GCP & BigQuery Data Processing
- Deep understanding of ELT/ETL principles
- Proficiency in dbt, Dataform, or SQL-based transformation tools for data modeling.
- Experience with GCP services: BigQuery, Dataflow (Apache Beam), Pub/Sub, Cloud Storage, and Cloud Functions.
BigQuery Optimization & Performance Tuning
- Experience optimizing BigQuery partitioning, clustering, materialized views, and query performance.
- Expertise in cost-efficient query design and workload optimization strategies.
Experience in Streaming & Real-Time Processing
- Hands-on experience with streaming data pipelines using Dataflow (Apache Beam), Apache Flink, Pub/Sub, or Kafka.
- Familiarity with real-time data transformations and event-driven architectures.
Experience Supporting BI & Analytics
- Strong knowledge of Tableau, Looker, and BI tools, ensuring reporting is optimized
- Ability to collaborate with data analysts and business teams to define data models and metrics.
Bonus Skills (Preferred but Not Required)
- Knowledge of Cloud Composer (Apache Airflow) for data orchestration.
- Familiarity with AI/ML model deployment and machine learning pipelines in GCP Vertex AI, Jupyter Notebooks, Pandas, etc.
- Understanding of and experience with development/deployment patterns and dependency managementt, CI/CD, Testing, CodeQuality, Devcontainers or nixpkgs, poetry/uv
- Programming abilities beyond python: Golang and/or Java/Kotlin/JVM
- Database Administration, experience with varied database systems [ NoSQL, graph, etc ].
Why Join Us?
- Work on a next-generation data platform built on Google BigQuery and GCP-native tools.
- Drive real-time data processing and self-service BI enablement in Tableau, Looker, and advanced analytics.
- Work with modern cloud-based technologies such as BigQuery, dbt, Dataflow, and Cloud Functions.
- Fully remote opportunity with a high-impact data engineering role.