About the Role
We are looking for a highly experienced Senior Data Engineer with strong expertise in real-time data processing and scalable data architectures. You will play a key role in designing, building, and optimizing data platforms that support analytics, reporting, and machine learning use-cases.
You will work closely with cross-functional teams (Data Science, Analytics, Product) to deliver high-performance data infrastructure and tools.
Key Responsibilities
Design & Build Data Pipelines: Architect, develop, and maintain robust ETL/ELT workflows for batch and real-time data ingestion and processing using Apache Spark (PySpark/Scala) and streaming technologies.
Real-Time Streaming: Implement and manage scalable streaming platforms using Apache Kafka (or similar messaging systems like Pub/Sub/Flink), ensuring reliable data flow with low latency.
Optimize Data Workloads: Tune Spark jobs, streaming processes, repository schemas, and SQL queries to maximize performance, minimize cost, and ensure efficient resource utilization.
Architect Scalable Data Systems: Build and maintain modern data architectures including data lakes, data warehouses (BigQuery), and metadata frameworks that support analytical and ML workloads.
Data Quality & Monitoring: Implement automated data quality checks, monitoring dashboards, alerts, and self-healing workflows to maintain high-fidelity data.
Cloud & DevOps Integration: Collaborate with Cloud and DevOps teams to deploy solutions leveraging Google Cloud Platform services, containerization (Docker), and orchestration tools (Kubernetes).
Documentation & Best Practices: Maintain technical documentation, enforce data governance standards, and advocate for best practices in data engineering.
Required Skills & Qualifications
Technical Skills
Programming: Strong proficiency in Python, SQL, with working knowledge of Scala or Java.
Big Data Frameworks: Expertise in Apache Spark (Spark SQL, DataFrames, Structured Streaming).
Streaming Technologies: Hands-on experience with Apache Kafka, Google Pub/Sub, or similar systems.
Cloud Platforms: Solid experience with Google Cloud Platform (Google Cloud Platform) data services (BigQuery, Dataflow, Pub/Sub, Dataproc, etc.).
Data Stores: Experience with data warehousing solutions such as BigQuery, Snowflake, Redshift, and familiarity with NoSQL databases.
Professional Experience
Minimum 8 years of industry experience building enterprise data solutions.
4+ years of recent, hands-on experience with Google Cloud Platform data services.
Proven track record of delivering productionized data platforms supporting analytics and ML.