Our ideal candidate has a background in distributed applications, data warehousing, and databases.
The candidate is highly proficient in hadoop ecosystem, and spark data stack.
The candidate has strong programming skills in Scala, Java, Python, SQL and has an expertise in statistical algorithms for data analysis.
Deliver a Data Service Layer that acquires data from multiple financial data sources to manage data sets for various business applications.
The data lake provides high data quality, rich meta-data, and on-demand transformations to build data feeds for reporting and risk computations.
•Responsible for Big Data solution architecture, design, development, and deliver production grade solutions
•Hands-on expertise with various big data technologies and the ability to lead an agile delivery team
•Knowledge to measure performance of data solutions, diagnose bottlenecks, and tools to monitor and tune performance.
•Deploy flexible, scalable, and resilient data solutions to meet evolving client data product requirements.
•BS/MS degree in Computer Science, Engineering, Applied Mathematics or a related field or equivalent experience
•7+ years of hands-on programming expertise in Scala, Java, SQL, Python
•3+ years of experience with large datasets in Hadoop (Cloudera) and Spark Ecosystem
•Hands-on experience in Hadoop data storage, data stores (HBase, Cassandra), and tools (Oozie, Sqoop, Flume etc)
•Well versed in Cloudera (CDH 5.x) to manage security, metadata, lineage, job management, Optimizer, Record Service etc
•Expertise in kafka (distributed logs) and Spark streaming architecture and development
•Experience in design and development of SQL on Hadoop applications (Spark SQL, Impala) and Query Optimization
•Troubleshoot, tune, and accelerate data pipelines, data queries, and real time streaming events.
•Passionate, self-motivated and willingness to learn
Nice to Have
•Expertise in leading cloud technologies like Amazon Web Services
•Certification in Hadoop and Spark a plus