Here are some tasks that you could do day to day.
Design and implement distributed data processing pipelines using Spark, Hive, Python, and other tools and languages prevalent in the Hadoop ecosystem. You will be given the opportunity to own the design and implementation. You will collaborate with Product managers, Data Scientists, Engineering folks to accomplish your tasks.
Publish RESTful API’s to enable real-time data consumption using OpenAPI specifications. This will enable many teams to consume the data that is being produced.
Explore and build proof of concepts using open source NOSQL technologies such as HBase, DynamoDB, Cassandra and Distributed Stream Processing frameworks like ApacheSpark, Flink, Kafka stream.
Take part in DevOps by building utilities, user defined functions and frameworks to better enable data flow patterns.
Work with architecture/engineering leads and other teammates to ensure high quality solutions through code reviews, engineering best practices documentation.
Experience in Business Rule management systems like Drools will also come in handy.
Some combination of these qualifications and technical skills will position you well for this role:
MS/BS degree in a computer science or related discipline
5+ years’ experience in large-scale software development/Big Data technologies
Programming skills in Java/Scala, Python, Shell scripting, and SQL
Development skills around Spark, MapReduce, and HiveStrong skills around developing RESTful API’s