- Looking for Apache Spark / Cassandra Architect hands on experience.
- Experience in capacity planning of the infrastructure (type of servers, memory, cpu requirements based on the type of the service needed on the server).
- Extensive experience with HDP 2.4/2.6 stack.
- Extensive experience needed on Apache Spark (infrastructure as well as the Programming and architecture).
- Experience on Apache Nifi.
- Experience in HDFS, MapReduce2, Yarn, Hive, Zookeeper, Ambari Metrics, Kafka, Log Search
- He/she needs to be hands-on Architect who can write code as well as provide the solutions in HDP scope.
- ElasticSearch and Visulization tools experience.
Architect Apache Spark / Cassandra
Role & Responsibilities:
Design and implement Big Data analytic solutions on a Hadoop–based platform. Create custom analytic and data mining algorithms to help extract knowledge and meaning from vast stores of data. Refine a data processing pipeline focused on unstructured and semi-structured data refinement. Support quick turn and rapid implementations and larger scale and longer duration analytic capability implementations.
- Hadoop development and implementation.
- Loading from disparate data sets.
- Pre-processing using Hive and Pig.
- Designing, building, installing, configuring and supporting Hadoop.
- Translate complex functional and technical requirements into detailed design.
- Perform analysis of vast data stores and uncover insights.
- Maintain security and data privacy.
- Create scalable and high-performance web services for data tracking.
- High-speed querying.
- Managing and deploying HBase.
- Propose best practices/standards.
- Experience with Hadoop and the HDFS Ecosystem
- Strong Experience with Apache Spark, Apache Cassandra is must.
- Experience with Python, R, Pig, Hive, Kafka, Knox, Tomcat and Ambari
- Experience with MongoDb
- A minimum of 4 years working with HBase/Hive/MRV1/MRV2 is required
- Experience in integrating heterogeneous applications is required
- Experience working with Systems Operation Department in resolving variety of infrastructure issues
- Experience with Core Java, Scala, Python, R
- Experience on Relational Data Base Systems(SQL) and Hierarchical data management
- Experience with MapReduce
- Experience to ETL tools such as Sqoop and Pig
- Data-modeling and implementation
- Experience in working with market / streaming data and time-series analytics
- Experience on working with different caching strategies
- Experience on working with multiple solutions for data movements such as – file copy, pub-sub, ftp, etc
- Development of web-based and digital framework for content delivery
- Experience with batch processing
- Experience working with Hortonworks or Cloudera (preferred)
- Data Torrent / Apex and Pentaho is a plus
- Experience with Navigator is a plus
- Experience with REST API is a plus
- Experience with streaming processing is a plus
- Exposure to encryption tools (HP Voltage) is a plus