Role: Data Engineer (Google Cloud Platform)
Location: Canada/ Remote
Job Type: Full time
Experience: 5 Years
Role Overview:
We’re looking for a skilled Data Engineer to design, build, and optimize scalable, cloud-native data pipelines on Google Cloud Platform (Google Cloud Platform). The role involves extensive work with Apache Airflow, Spark, Python, and Scala to develop high-performance data solutions supporting analytics, streaming, and generative AI initiatives.
Key Responsibilities:
β Develop, automate, and maintain batch and streaming ETL pipelines using Apache Airflow, Apache Spark, Python, and Scala.
β Build and manage cloud-based data ecosystems on Google Cloud Platform (BigQuery, Bigtable, Dataproc, Pub/Sub, Cloud Storage, IAM, VPC).
β Design and optimize SQL and NoSQL data models for data lakes and warehouses (BigQuery, MongoDB, Snowflake).
β Write complex SQL queries for advanced data transformation, aggregation, and analytics optimization within BigQuery or equivalent platforms.
β Apply modern Test Driven Development (TDD) methodologies for big data pipelines, ensuring test automation across Airflow workflows, Spark jobs, and transformation logic.
β Apply data mesh and data-as-a-product principles to enable reusable and domain-driven datasets.
β Implement real time ingestion with Kafka Connect and process streaming data using Spark Streaming, Apache Flink, or similar technologies
β Optimize data performance, scalability, and cost efficiency across Google Cloud Platform components.
β Ensure compliance with PCI and PII data with standards such as GDPR, PCI DSS, SOX, and CCPA.
β Integrate GenAI tools such as OpenAI, Gemini, and Anthropic LLMs for intelligent data quality and analytics enhancement.
β Collaborate with stakeholders, data scientists, and full stack engineers to deliver trusted, documented, and reusable data products
Required Qualifications:
β Bachelor’s or Master’s in Computer Science, Data Engineering, or related field.
β 5+ years of hands-on experience with large-scale data engineering in cloud environments.
β Advanced skills using Python, Scala, Spark ecosystem, SQL to build data pipelines
β Strong Google Cloud Platform expertise (BigQuery, Bigtable, Dataproc, Pub/Sub, IAM, VPC).
β Proficiency in SQL/NoSQL modeling and data architecture for cloud data lakes.
β Familiarity with streaming frameworks (Kafka, Flume).
β Experience handling sensitive data and ensuring regulatory compliance.
β Working knowledge of Docker, CI/CD, and modern DevOps practices for data platforms.
Preferred Qualifications:
β Experience with Infrastructure as Code (IaC) tools such as Terraform or Ansible.
β Contributions to open-source projects or internal developer tooling.
β Prior experience building Customer Data Platforms (CDPs) inhouse
β Experience with AI-assisted developer tools (for example, IntelliJ plug-ins using OpenAI or Anthropic models), Codex CLI, Windsurf.