Role - Big Data Dev/Spark Scala Engineer
Experience Required - 7+ Years
Must Have Technical/Functional Skills
• Experience with Apache Ozone and/or Ceph as storage backends for analytics workloads
• Experience implementing exactly once / at least once streaming semantics
• Strong background in Spark performance tuning (CPU, memory, I/O, shuffle)
• Experience supporting mission critical production systems with strict SLAs
• Familiarity with CI/CD pipelines and automated testing for data applications
• Experience designing observability for streaming systems (lag, throughput, backpressure)
Technical Skills
• Languages: Scala, Python (PySpark), SQL
• Big Data: Apache Spark (Core, SQL, Structured Streaming)
• Streaming: Kafka
• Ingestion / Orchestration: Apache NiFi
• Storage: Apache Ozone, Ceph, object storage concepts
• OS & Tooling: Linux, Git, CI/CD, monitoring and logging tools
Roles & Responsibilities
Key Responsibilities
• Design, develop, and maintain large scale Spark applications using Scala and PySpark
• Build and operate streaming heavy data pipelines using Kafka and Spark Structured Streaming
• Implement stateful streaming patterns including windowing, watermarking, late data handling, and checkpointing
• Develop robust event replay and reprocessing workflows using Kafka offsets and partitions
• Build ingestion and routing flows using Apache NiFi, including Kafka based ingestion patterns
• Implement end to end ETL/ELT pipelines with strong emphasis on low latency, fault tolerance, and scalability
• Optimize Spark jobs through partitioning strategies, memory tuning, shuffle optimization, and efficient data formats
• Integrate Spark workloads with distributed object storage systems such as Apache Ozone and Ceph
• Ensure data quality, consistency, and auditability through validation, reconciliation, and metadata capture
• Collaborate with platform, infrastructure, and operations teams on production readiness and capacity planning
• Support production systems, including monitoring, incident analysis, and root cause resolution
• Contribute to reusable frameworks, coding standards, and engineering best practices
• Participate in architecture reviews, code reviews, and technical documentation
Required Qualifications
• Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience
• Strong hands on experience with Apache Spark in production environments
• Advanced proficiency in Scala and PySpark
• Solid understanding of distributed systems and data processing at scale
• Strong experience with Kafka based streaming architectures
• Hands on experience with Spark Structured Streaming
• Experience building batch and real time pipelines
• Hands on experience with Apache NiFi for data ingestion and flow management
• Strong SQL skills and experience working with structured and semi structured data
• Experience working with object storage or distributed storage platforms
• Proficiency with Linux, shell scripting, and Git based version control