Overview
Skills
Job Details
Note: This position is open only for C2C candidates.
Responsibilities:
Continuous Deployment using GitHub Actions, Flux, Kustomize
Design and implement cloud solutions, build MLOps on cloud AWS
Data science model containerization, deployment using docker, VLLM, Kubernetes
Communicate with a team of data scientists, data engineers and architects, document the processes
Develop and deploy scalable tools and services for our clients to handle machine learning training and inference.
Knowledge of ML models and LLM
Qualifications:
6+ years of experience in ML Ops with strong knowledge in Kubernetes, Python, MongoDB and AWS.
Good understanding of Apache SOLR.
Proficient with Linux administration.
Knowledge of ML models and LLM.
Ability to understand tools used by data scientists and experience with software development and test automation
Ability to design and implement cloud solutions and ability to build MLOps pipelines on cloud solutions (AWS)
Experience working with cloud computing and database systems
Experience building custom integrations between cloud-based systems using APIs
Experience developing and maintaining ML systems built with open-source tools
Experience with MLOps Frameworks like Kubeflow, MLFlow, DataRobot, Airflow etc., experience with Docker and Kubernetes
Experience developing containers and Kubernetes in cloud computing environments
Familiarity with one or more data-oriented workflow orchestration frameworks (Kubeflow, Airflow, Argo, etc.)
Ability to translate business needs to technical requirements
Strong understanding of software testing, benchmarking, and continuous integration
Exposure to machine learning methodology and best practices
Good communication skills and ability to work in a team
Note: Focus is to have 60% SRE and 40% ML Ops
Skill Area Includes Weight (%)
Platform Reliability & Containerization Kubernetes, Docker, Microservices, Linux 30%
MLOps & AWS Cloud Model deployment, versioning, monitoring, AWS (SageMaker, S3, Lambda, EKS) 25%
CI/CD & GitOps GitHub Actions, Flux 15%
Monitoring & Observability Splunk, Grafana, Prometheus, performance tracking 15%
Integration & Collaboration Python scripting, API integrations, Apache Solr, LLM awareness, teamwork with data scientists & engineers 15%