Site Reliability Engineer SRE ML platform

Overview

On Site

$40 - $50

Accepts corp to corp applications

Contract - Independent

Contract - W2

Contract - 12 Month(s)

100% Travel

Able to Provide Sponsorship

Skills

Continuous Delivery

Automated Testing

Benchmarking

Cloud Computing

Collaboration

Communication

API

Amazon S3

Amazon SageMaker

Amazon Web Services

Linux Administration

Database

Docker

FOCUS

GitHub

Grafana

Kubernetes

Apache Solr

Continuous Integration

Continuous Integration and Development

Data Science

MongoDB

Open Source

Orchestration

Python

Scripting

Software Development

Linux

Machine Learning (ML)

Machine Learning Operations (ML Ops)

Microservices

Software Testing

Splunk

Teamwork

Training

Workflow

Job Details

Note: This position is open only for C2C candidates.

Responsibilities:
Continuous Deployment using GitHub Actions, Flux, Kustomize
Design and implement cloud solutions, build MLOps on cloud AWS
Data science model containerization, deployment using docker, VLLM, Kubernetes
Communicate with a team of data scientists, data engineers and architects, document the processes
Develop and deploy scalable tools and services for our clients to handle machine learning training and inference.
Knowledge of ML models and LLM
Qualifications:
6+ years of experience in ML Ops with strong knowledge in Kubernetes, Python, MongoDB and AWS.
Good understanding of Apache SOLR.
Proficient with Linux administration.
Knowledge of ML models and LLM.
Ability to understand tools used by data scientists and experience with software development and test automation
Ability to design and implement cloud solutions and ability to build MLOps pipelines on cloud solutions (AWS)
Experience working with cloud computing and database systems
Experience building custom integrations between cloud-based systems using APIs
Experience developing and maintaining ML systems built with open-source tools
Experience with MLOps Frameworks like Kubeflow, MLFlow, DataRobot, Airflow etc., experience with Docker and Kubernetes
Experience developing containers and Kubernetes in cloud computing environments
Familiarity with one or more data-oriented workflow orchestration frameworks (Kubeflow, Airflow, Argo, etc.)
Ability to translate business needs to technical requirements
Strong understanding of software testing, benchmarking, and continuous integration
Exposure to machine learning methodology and best practices
Good communication skills and ability to work in a team

Note: Focus is to have 60% SRE and 40% ML Ops

Skill Area Includes Weight (%)
Platform Reliability & Containerization Kubernetes, Docker, Microservices, Linux 30%
MLOps & AWS Cloud Model deployment, versioning, monitoring, AWS (SageMaker, S3, Lambda, EKS) 25%
CI/CD & GitOps GitHub Actions, Flux 15%
Monitoring & Observability Splunk, Grafana, Prometheus, performance tracking 15%
Integration & Collaboration Python scripting, API integrations, Apache Solr, LLM awareness, teamwork with data scientists & engineers 15%

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

About Padmas Technology LLC

Share