Overview
Hybrid
Depends on Experience
Accepts corp to corp applications
Contract - W2
Contract - 12 Month(s)
Skills
RTML Framework
kubernetes
KubeFlow
ML Ops
Job Details
Title: RTML Engineer // ML Ops Engineer
Location: Dallas, TX (Or) NJ
What you will be doing:
You will join our critical Real Time ML Service team working on our RTML Model Serving Framework.
This is a fundamental team in our AI Center, and RTML Framework serves all of our real time AI models in the production - enabling our business organizations to maximize the benefits of using AI-driven solutions for our customers.
As a Principle Engineer, you will be
- Functioning as a domain expert in the area of RTML model serving technology, familiar with the industrial trends in RTML, common RTML architectures, leading 3rd-party RTML serving products, and evaluation criteria s
- Working closely with other teams to define technical strategy, architecture, development choices and ensure overall growth of the Jarvis Framework to meet our internal customers needs.
- Leading the Jarvis development activities through phased releases, ensuring it is architecturally sound, implemented correctly/efficiently, and delivered on time.
- Supporting internal customers with major framework issues and coordinating triage efforts to solve them.
- Lead and mentor junior developers in the team and always pushing for team successes.
- Adhering to industry standards and best practices and tracking emerging RTML technologies and trends to continuously improve the Jarvis framework.
You ll need to have:
- Bachelor s degree or above in Computer Science/Engineering or other related areas.
- Four or more years of work experience in computer software development related jobs.
- At least two years are in AI / ML Engineering areas with reasonably good understanding of Data Science and AIML practices/workflows.
- Strong expertise in RTML model serving arena and/or large scale cloud-based RT framework development.
- Experience with kubernetes. The candidate should be comfortable with kubectl and helm.
- Experience in creating, deploying, and maintaining centralized KubeFlow infrastructure on top of one or multiple kubernetes clusters
- Experience with cloud infrastructures and ML Ops in clouds.
- Familiar with CI/CD process and common frameworks such as ArgoCD.
- Experience with programming languages such as Python and Java.
- Experience in large application development in cloud environments - AWS, Google Cloud Platform and On-Prem clusters.
- Experience in K8s architecture and principle of operations, hands-on skills of deploying large applications in production K8s cluster, configuring K8s properly, and troubleshooting when the application has issues.
- Good understanding of of RT system stats collection and performance monitoring methods
- Basic understanding of RT Feature Engineering methodology and practices
- Understand basic data science concepts and common needs from data scientists.
Raj Vemula
Director Resource Development