Overview
Skills
Job Details
Google Cloud Platform DataProc Developer
Remote
Job Summary:
We are seeking a skilled and proactive Google Cloud Platform Dataproc Developer to design and implement scalable data ingestion pipelines and integrate with REST APIs for data persistence. The ideal candidate will have hands-on experience with Google Cloud Platform (Google Cloud Platform), particularly Dataproc, and a strong understanding of distributed data processing, API communication, and cloud-native development practices.
Key Responsibilities:
Design and develop scalable file ingestion processes using Google Cloud Platform Dataproc (Apache Spark/Hadoop).
Implement data transformation and cleansing logic as part of ingestion workflows.
Integrate with RESTful APIs to persist processed data into downstream systems or databases.
Optimize performance and cost-efficiency of Dataproc clusters and jobs.
Automate pipeline orchestration using tools like Cloud Composer (Airflow) or custom scripts.
Ensure robust error handling, logging, and monitoring of ingestion and API processes.
Collaborate with data architects, analysts, and other engineering teams to align on data requirements and integration strategies.
Maintain documentation for data flows, API contracts, and operational procedures.
Required Skills & Qualifications:
7+ years of experience in data engineering or cloud development roles.
Strong hands-on experience with Google Cloud Platform Dataproc, Spark, and Hadoop ecosystem.
Proficiency in Python, Java, or Scala for data processing and API integration.
Experience with RESTful API design, consumption, and authentication (OAuth, API keys).
Familiarity with Google Cloud Platform services such as Cloud Storage, Pub/Sub, BigQuery, and IAM. Knowledge of CI/CD practices and tools (e.g., Cloud Build, GitHub Actions).
Excellent problem-solving and communication skills.
Display Settings Focus.