RTML Engineer // ML Ops Engineer

Overview

Hybrid
Depends on Experience
Accepts corp to corp applications
Contract - W2
Contract - 12 Month(s)

Skills

RTML Framework
kubernetes
KubeFlow
ML Ops

Job Details

Title: RTML Engineer // ML Ops Engineer

Location: Dallas, TX (Or) NJ

What you will be doing:

You will join our critical Real Time ML Service team working on our RTML Model Serving Framework.

This is a fundamental team in our AI Center, and RTML Framework serves all of our real time AI models in the production - enabling our business organizations to maximize the benefits of using AI-driven solutions for our customers.

As a Principle Engineer, you will be

  • Functioning as a domain expert in the area of RTML model serving technology, familiar with the industrial trends in RTML, common RTML architectures, leading 3rd-party RTML serving products, and evaluation criteria s
  • Working closely with other teams to define technical strategy, architecture, development choices and ensure overall growth of the Jarvis Framework to meet our internal customers needs.
  • Leading the Jarvis development activities through phased releases, ensuring it is architecturally sound, implemented correctly/efficiently, and delivered on time.
  • Supporting internal customers with major framework issues and coordinating triage efforts to solve them.
  • Lead and mentor junior developers in the team and always pushing for team successes.
  • Adhering to industry standards and best practices and tracking emerging RTML technologies and trends to continuously improve the Jarvis framework.

You ll need to have:

  • Bachelor s degree or above in Computer Science/Engineering or other related areas.
  • Four or more years of work experience in computer software development related jobs.
  • At least two years are in AI / ML Engineering areas with reasonably good understanding of Data Science and AIML practices/workflows.
  • Strong expertise in RTML model serving arena and/or large scale cloud-based RT framework development.
  • Experience with kubernetes. The candidate should be comfortable with kubectl and helm.
  • Experience in creating, deploying, and maintaining centralized KubeFlow infrastructure on top of one or multiple kubernetes clusters
  • Experience with cloud infrastructures and ML Ops in clouds.
  • Familiar with CI/CD process and common frameworks such as ArgoCD.
  • Experience with programming languages such as Python and Java.
  • Experience in large application development in cloud environments - AWS, Google Cloud Platform and On-Prem clusters.
  • Experience in K8s architecture and principle of operations, hands-on skills of deploying large applications in production K8s cluster, configuring K8s properly, and troubleshooting when the application has issues.
  • Good understanding of of RT system stats collection and performance monitoring methods
  • Basic understanding of RT Feature Engineering methodology and practices
  • Understand basic data science concepts and common needs from data scientists.

Raj Vemula

Director Resource Development