Hadoop Architect

spark, hadoop architect, pyspark, apache, Spark SQL
Full Time, Contract Independent, Contract W2, Fulltime

Job Description

Hadoop Architect

Location: Charlotte, NC / Pennington, NJ / Dallas, TX

Fulltime

  • Minimum 12+ years of experience. Be responsible for the design and support of the API(s) between IBM Spectrum Conductor (as a distributed compute and storage platform), and the Bank of America application that exposes a data scientist user experience and model governance.
  • Capture cluster tenant compute and storage functional and nonfunctional requirements; and translate into distributed cluster capacity, configuration, and user provisioning settings.
  • Develop, test, and analyze code/scripts written in PySpark, Python, Java, and other shell scripts, to provide specified behavior on a distributed IBM Spectrum Conductor cluster.
  • Provide "how-to" technical support for tenant developers developing runtimes and persisting data on a distributed IBM Spectrum Conductor cluster.
  • Be an active member of the Agile scrum and be a part of the features that emerge from the team.
  • Perform peer code and test case reviews and help foster a healthy technical community by helping peers.
  • Qualifications:
  • Experience with Agile/Scrum methodology is essential.
  • Experience with either Apache Spark-On-YARN (Hadoop) or Apache Spark-On-EGO (IBM Spectrum Conductor) is essential.
  • Experience with Apache Spark Libraries: PySpark, Spark SQL, Spark Streaming, MLlib are essential.
  • Experience with either Hadoop/YARN or IBM Spectrum Conductor/EGO cluster resource manager is essential.
  • Experience with RedHat Linux (RHEL) command line and shell scripting are essential.
  • Experience with file formats CSV, JSON, ORC, Avro, Parquet, Protocol Buffers are essential.
  • Experience with Python, Java, and R are highly desirable.
  • Experience with Numpy, and Pandas are highly desirable.
  • Experience with designing and configuring distributed architectures are desirable.
  • Knowledge of CI/CD SDLC practices.
  • Knowledge of Scikit-Learn, PyTorch, Keras, H20.ai.
  • Strong communication skills, should be able to communicate effectively with business and other stakeholders. Demonstrate ownership and initiative taking
Dice Id : 10412358
Position Id : 2021-13686
Originally Posted : 2 months ago
Have a Job? Post it