Data Engineer / ETL Developer, ML Flow, Spark, PySpark, Python, PyTorch, Databricks/Snowflake, Teradata, HIVE, SQL Server, DB2, SQL, Stored Procs 12+ Mths Cont NYC
JPC - 3550
Level 3: (7 to 10 yrs of industry Exp)
Loc: New York City, NY (3-days a week in office is mandatory)
Duration: 12+ months
Location: New York, NY, USA,10004
GENERAL DESCRIPTION
Description:
Position Description
The position will be responsible for the development of ETL components, providing user access to the data via reports, data extracts, utilizing analysis tools such as OLAP, and for coding stored procedures. The candidate will be working with multiple database systems (Teradata, HIVE, SQL Server, DB2, and Snowflake) including Cloud system, both on prem and Cloud. The roll will require the candidate to possess a strong understanding of database concepts including data warehouse, operational data stores, and data marts. Responsibilities will also require in-depth knowledge of ETL concepts and hands-on experience in implementing data integrations in multiple database platforms using custom development, scripting language such as Unix Shell, and ETL tool such as Informatica.
The Role requires the candidate to work on data engineering pipelines using Spark on Cloud with tools like Databricks and Snowflake, and work on Statistical Risk models developed on C++ to reengineer them with better latency for cloud using latest technologies like Pytorch and design reusable components, utilities and ability to think out-of-the-box architecture to have seamless experience for modelers from design to implementation to training to deployment of models to production lifecycle.
Responsibilities:
Work with Quantitative Strategist/Statistical Modeler to build, enhance, and execute/test scenarios
Ability to develop, run and infer Machine Learning Models and Statistical Models on cloud
Identify potential improvements to the current design/processes.
Ability to assess risks in design/development upfront.
Plan and co-ordinate the data/process migration across databases.
Participate in multiple project discussions as a senior member of the team.
Serve as a coach/mentor for junior developers.
Provide thought leadership.
Lead team in new initiatives such as cloud strategy.
Required Skills
10 + years of total experience.
Strong Python, Spark, PyTorch, PySpark, Java and C++ scripting experience
Have knowledge on Machine Learning, Statistics and Model development, training and inference.
Have strong design and problem-solving skills
Strong understanding of machine learning tools ML Flow, Databricks, Snowflake on Cloud
SQL skills and database programming skills including creating views, stored procedures, triggers, implementing referential integrity, as well as designing and coding for performance.
Knowledge and hands-on experience of RDBMS systems (e.g.: DB2, Teradata, or Oracle, Snowflake).
Good communication and leadership skills.
Organization, discipline, detail-orientation, self-motivation, and focused on delivery.
In-depth knowledge and hands-on experience Unix/Linux programming (shell and/or Perl).
Desired Skills
Experience in the Financial Industry
Good understanding of Data engineering principles and risk model development.
ETL experience with Informatica
Experience in Cloud database (e.g.: AZURE, Snowflake).
Experience with Scala/Spark/C++/Pytorch
Experience with AngularJS
Experience in KDB