MLOps DevOps Engineer *** Direct end client ***

Depends on Experience

Contract: Independent, Corp-To-Corp, W2, 12 Month(s)


    AIAmazon Web ServicesAnsibleCloudFormationDevOpsJupyterKubernetesLambdaLinuxPython

    Job Description

    Job Responsibilities:

    • Developing, customizing and deploying MLOps services like Vertex AI, SageMaker, Kubeflow

    • Prototyping and developing cloud-native architecture solutions for application needs, particularly with AWS

    • Providing infrastructure-as-code utilizing Terraform and AWS Cloud Formation

    • Provide on-call support for the platform

    • Perform automation, testing, performance tuning, and tools development.

    • Provisioning and maintaining cloud infrastructure that will support training machine learning model



    • Develop and deploy customized Kubernetes clusters for MLOps services like Kubeflow

    • Configure and integrate various MLOps application components such as model lifecycle management, model serving, hyperparameter tuning, object storage, load balancers, authentication, etc. (e.g. mlflow, knative, katib, minio, istio, dex, oidc authservice)

    • Understanding of the ML workflow, and how ML pipelines automate the workflow (data preprocessing, model training, model evaluation, hyperparameter tuning, model serving, model registries, etc.)

    • Build and test ML pipelines

    • Develop custom container images optimized for ML experimentation

    • Develop and deploy SageMaker domains with custom lifecycle configurations (e.g. idle kernel auto-shutdown) and custom images

    • Wide experience with Kubernetes and Docker is a must have

    • Industry experience with Amazon Web Services, IAM, VPC, API Gateway, NLB, ALB, EC2, ECS, EKS, Lambda, S3, RDS, DynamoDB, SQS, etc. 

    • Candidate must have demonstrated a strong knowledge of Linux systems 

    • Proficiency in Python and Bash scripting is a must

    • Experience in CI/CD/CT pipelines implementation. Deployment automation with CICD tools and Infrastructure-as-Code (IaC)

    • Good understanding of networking and related protocols. (HTTP, DNS, TLS, TCP)

    • Candidates must have demonstrated experience in troubleshooting problems and working with a team to resolve production issues.

    • Understanding of cloud provisioning tools, e.g. CloudFormation and Terraform.

    • Good understanding of database technologies

    • Intimate familiarity with the DevOps toolkit (Terraform, Ansible, Chef, and other tools).

    • Exposure to messaging pub/sub systems (e.g. AWS SNS, SQS, RedisQ etc.)

    • Exposure to data science IDEs like Rstudio or Jupyter notebook is a huge plus