Overview
Skills
Job Details
Key Responsibilities
Data Pipeline Development: Design, build, test, and maintain scalable data pipelines and ETL processes using Python and Google Cloud Platform services (e.g., Dataflow, BigQuery, Pub/Sub).
Data Integration & Modeling: Implement batch and real-time data integration workflows, optimize data models and architecture for performance and storage efficiency.
Collaboration & Support: Work with cross-functional teams to gather data requirements and support data analysts with curated datasets and tools.
System Reliability: Monitor, troubleshoot, and tune data systems for high availability, scalability, and disaster recovery.
DevOps Enablement: Build and manage CI/CD pipelines using GitHub and Terraform; ensure security compliance and operational readiness.Mandatory Skills & Qualifications
Technical Expertise:
Strong Python programming and Spark experience for data analytics.
Proficient in Google Cloud Platform services: GCS, Dataflow, Cloud Functions, Composer, Scheduler, Datastream, Pub/Sub, BigQuery, Dataproc.
Skilled in Apache Beam for batch and stream processing.
Experience with REST API ingestion, JSON messaging, and scripting (Shell, Perl).
Deep understanding of SQL, cloud-native databases, and data warehousing concepts.
Engineering & Migration:
Proven experience in migrating legacy systems to modern cloud-based architectures.
Familiarity with distributed computing frameworks and large-scale data handling.
DevOps & Security:
CI/CD pipeline development with GitHub and Terraform.
Security integration in deployment workflows.
Soft Skills:
Strong problem-solving and analytical abilities.
Excellent communication and teamwork skills.