Charlotte NC / Pennington NJ/ Dallas TX
Minimum 7+ years exp.
Extensive Knowledge on ETL and Teradata.
Good exposure to Hadoop Eco system.
Minimum 1 year hands on exp in Pyspark.
Job scheduling tools (e.g. Autosys) & Version control tool like Git .
Unix shell scripting.
Basic knowledge on Mainframe, should be able to navigate through the jobs and code.
Quick learner and self-starter who requires minimal supervision to excel in a dynamic environment.
Strong Verbal and written Communication skills.
Prior experience of working with globally distributed teams
Agile driven development
Hadoop, Spark, SparkML, IBM Spectrum
Minimum 7 years of experience. Be responsible for the design and support of the API(s) between IBM Spectrum Conductor (as a distributed compute and storage platform), and the Bank of America application that exposes a data scientist user experience and model governance.
Capture cluster tenant compute and storage functional and nonfunctional requirements; and translate into distributed cluster capacity, configuration, and user provisioning settings.
Develop, test, and analyze code/scripts written in PySpark, Python, Java, and other shell scripts, to provide specified behavior on a distributed IBM Spectrum Conductor cluster.
Provide "how-to" technical support for tenant developers developing runtimes and persisting data on a distributed IBM Spectrum Conductor cluster.
Be an active member of the Agile scrum, and be a part of the features that emerge from the team.
Perform peer code and test case reviews, and help foster a healthy technical community by helping peers.
Experience with Agile/Scrum methodology is essential.
Experience with either Apache Spark-On-YARN (Hadoop) or Apache Spark-On-EGO (IBM Spectrum Conductor) is essential.
Experience with Apache Spark Libraries: PySpark, Spark SQL, Spark Streaming, MLlib are essential.
Experience with either Hadoop/YARN or IBM Spectrum Conductor/EGO cluster resource manager is essential.
Experience with RedHat Linux (RHEL) command line and shell scripting are essential.
Experience with file formats CSV, JSON, ORC, Avro, Parquet, Protocol Buffers are essential.
Experience with Python, Java, and R are highly desirable.
Experience with Numpy, and Pandas are highly desirable.
Experience with designing and configuring distributed architectures are desirable.
Knowledge of CI/CD SDLC practices.
Knowledge of Scikit-Learn, PyTorch, Keras, H20.ai.
Strong communication skills, should be able to communicate effectively with business and other stakeholders.
Demonstrate ownership and initiative taking
USA Okaya Inc.
4949 Expy Dr N, Suite 101, Ronkonkoma, NY 11779