Job Title: Senior Data Engineer (Java, AWS & Big Data)
Location: Austin, TX or Sunnyvale, CA (5x/ week onsite) JD:
Data Engineer
Submission must have LinkedIn profile
Key Responsibilities:
Design, build, and maintain data pipelines across on-prem Hadoop and AWS
Develop and maintain Java applications, utilities, and data processing libraries
Manage and enhance internal Java libraries used for ingestion, validation, and transformation
Migrate and sync data from on-prem HDFS to AWS S3
Develop and maintain Airflow DAGs for orchestration and scheduling
Work with Kafka-based streaming pipelines for real-time/near-real-time ingestion
Build and optimize Spark / PySpark jobs for large-scale data processing
Use Hive, Presto/Trino, and Athena for querying and validation
Implement data quality checks, monitoring, and alerting
Support Iceberg tables and AWS external tables
Troubleshoot production issues and ensure SLA compliance
Collaborate with platform, analytics, and observability teams
Technical Skills Required:
Java (Development, maintenance, build tools like Gradle)
AWS (S3, Glue, EMR, Athena, EKS basics)
Hadoop/HDFS, Hive
Apache Kafka (producers/consumers, topics, streaming ingestion)
Apache Spark / PySpark (batch + streaming processing)
Apache Airflow (DAG development and maintenance)
Python
Git and CI/CD workflows
Observability tools (PrometheGrafana)
SQL
Role Descriptions: Key Responsibilities Design| build| and maintain data pipelines across on-prem Hadoop and AWS Develop and maintain Java applications| utilities| and data processing libraries Manage and enhance internal Java libraries used for ingestion| validation| and transformation Migrate and sync data from on-prem HDFS to AWS S3 Develop and maintain Airflow DAGs for orchestration and scheduling Work with Kafka-based streaming pipelines for real-timenear-real-time ingestion Build and optimize Spark PySpark jobs for large-scale data processing Use Hive| PrestoTrino| and Athena for querying and validation Implement data quality checks| monitoring| and alerting Support Iceberg tables and AWS external tables Troubleshoot production issues and ensure SLA compliance Collaborate with platform| analytics| and observability teams Technical Skills RequiredJava (Development| maintenance| build tools like Gradle) AWS (S3| Glue| EMR| Athena| EKS basics) HadoopHDFS| HiveApache Kafka (producersconsumers| topics| streaming ingestion) Apache Spark PySpark (batch streaming processing) Apache Airflow (DAG development and maintenance) Python Git and CICD workflows Observability tools (PrometheusGrafana)SQL
Essential Skills: Key Responsibilities Design| build| and maintain data pipelines across on-prem Hadoop and AWS Develop and maintain Java applications| utilities| and data processing libraries Manage and enhance internal Java libraries used for ingestion| validation| and transformation Migrate and sync data from on-prem HDFS to AWS S3 Develop and maintain Airflow DAGs for orchestration and scheduling Work with Kafka-based streaming pipelines for real-timenear-real-time ingestion Build and optimize Spark PySpark jobs for large-scale data processing Use Hive| PrestoTrino| and Athena for querying and validation Implement data quality checks| monitoring| and alerting Support Iceberg tables and AWS external tables Troubleshoot production issues and ensure SLA compliance Collaborate with platform| analytics| and observability teams Technical Skills RequiredJava (Development| maintenance| build tools like Gradle) AWS (S3| Glue| EMR| Athena| EKS basics) HadoopHDFS| HiveApache Kafka (producersconsumers| topics| streaming ingestion) Apache Spark PySpark (batch streaming processing) Apache Airflow (DAG development and maintenance) Python Git and CICD workflows Observability tools (PrometheusGr
NOTE :
Kindly find the below-mentioned job description and if interested, share your updated resume at sravani at galaxyitech.com[or]four eight zero six nine six five three nine one