Overview
Skills
Job Details
Job Title: MLOps Engineer
Location: South San Francisco, CA (Hybrid 3 days/week onsite)
About the Role:
We re seeking a seasoned Machine Learning Engineer (Operations) with deep expertise in AWS-native tools, machine learning pipelines, and production-level model deployment. You will be responsible for building and optimizing scalable, secure, and efficient ML systems in a hybrid environment, with a focus on automation, monitoring, and integration.
ML Model Lifecycle:
Build, train, deploy, and manage ML models using AWS services like SageMaker, EC2, S3, and Lambda.
Optimize performance and resource usage for cost-effective ML operations.
Leverage Amazon Bedrock for deploying and managing foundation models in GenAI use cases.
Data Processing & Pipelines:
Design and maintain automated ML pipelines using AWS Glue and Step Functions.
Implement seamless data ingestion, transformation, and storage strategies.
LLM & GenAI Experience:
Implement and optimize Large Language Models (LLMs) for real-world applications.
Monitor performance, fine-tune parameters, and enhance model accuracy via prompt engineering.
Infrastructure & Deployment:
Containerize ML models with Docker and manage orchestration via Kubernetes (EKS/ECS).
Use CloudFormation or Terraform for infrastructure as code (IaC) setup.
Integrate CI/CD tools to automate ML workflow deployments.
Monitoring & Reliability:
Implement robust logging, monitoring, and alerting via AWS CloudWatch.
Ensure production-grade model reliability and scalability.
Security & Governance:
Apply security best practices in containerization, data handling, and infrastructure.
Ensure compliance with data governance policies.
Integration & External Systems:
Integrate ML workflows with external systems (e.g., Veeva Promomat or similar platforms).
Required Qualifications:
Strong understanding of machine learning concepts, algorithms, and best practices.
Proven hands-on experience with:
Amazon SageMaker, EC2, S3, Lambda
AWS Textract for document data extraction
AWS Glue, Step Functions
Amazon Bedrock, RDS, DynamoDB
Docker, Kubernetes (EKS/ECS)
AWS CloudWatch
Python for data processing, automation, and scripting
Demonstrable experience with LLM optimization, prompt engineering, and GenAI applications.
Experience implementing testing practices (unit/integration) in Python.
Familiarity with CI/CD tools and external system integrations.
Preferred Qualifications:
AWS Certifications (e.g., ML Specialty, DevOps Engineer)
Experience with Terraform or AWS CloudFormation
Background in content management/regulatory systems like Veeva
Knowledge of container security, resource optimization, and performance tuning
Soft Skills:
Strong analytical and problem-solving mindset.
Self-driven with a proactive, results-oriented approach.
Commitment to continuous improvement in MLOps practices