MLOps Engineer

Overview

Remote

Depends on Experience

Full Time

Skills

Amazon EC2

Job Details

MLOps Engineer

Location South San Franciso CA (Hybrid, 3 days/week) (Not remote)

Strong understanding of machine learning concepts, algorithms, and best practices.
Proven experience in creating, managing, and deploying ML models using core AWS services such as Amazon SageMaker (for model building, training, and deployment), EC2 (for compute instances), S3 (for data storage), and Lambda (for serverless functions).
Experience with AWS Textract for document data extraction.
Demonstrable experience in designing, developing, and maintaining automated data processing and ML training pipelines using AWS Glue (for ETL) and AWS Step Functions (for workflow orchestration).
Proficiency in ensuring seamless data ingestion, transformation, and storage strategies within the AWS ecosystem.
Experience in optimizing AWS resource usage for cost-effectiveness and efficiency in ML operations.
Experience with Amazon Bedrock for leveraging and managing foundation models in generative AI applications.
Knowledge of database services like Amazon RDS or Amazon DynamoDB for storing metadata, features, or serving model predictions where applicable.
Hands-on experience with implementing monitoring, logging, and alerting mechanisms using AWS CloudWatch.
Experience with AWS container services like EKS (Elastic Kubernetes Service) or ECS (Elastic Container Service) for managing container orchestration.
Experience in implementing scalable and reliable ML model deployments in a production environment.
Practical experience in implementing, deploying, and optimizing Large Language Models (LLMs) for production use cases.
Ability to monitor LLM performance, fine-tune parameters, and continuously update/refine models based on new data and performance metrics.
Proven ability to create and experiment with effective prompt engineering strategies to improve LLM performance, accuracy, and relevance.
Proficiency in using Docker to package ML models and applications into containers.
Experience with Kubernetes for container orchestration, including managing deployments, scaling, and networking.
Knowledge of best practices for container security, performance optimization, and resource utilization.
Strong proficiency in Python programming for data processing, model training, deployment automation, and general scripting.
Experience in implementing robust testing (e.g., unit tests, integration tests) and debugging practices for Python code.
Adherence to best practices and coding standards in Python development.
Experience or familiarity with integrating external systems or platforms, such as Veeva Promomat (or similar content management/regulatory systems), with ML workflows.
Strong analytical and problem-solving skills with the ability to troubleshoot complex issues in ML systems and data pipelines.
A proactive and results-oriented mindset with a focus on continuous improvement and innovation in MLOps practices.
Relevant AWS certifications (e.g., AWS Certified Machine Learning - Specialty, AWS Certified DevOps Engineer) is a plus.
Experience with Infrastructure as Code (IaC) tools like AWS CloudFormation or Terraform.
Familiarity with CI/CD pipelines and tools for automating ML workflows.
Understanding of data governance and security best practices in the context of ML.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share