Our client is seeking a Senior ML Systems Engineer contractor to bridge the gap between model development and production. Kindly find the detailed job description below, if interested apply with your updated resume and contact info.
**Note: This is an Engineering role, not a Data Science role. Our Data Science team provides the validated models and LLM prompts.
Your mission is to build the high-throughput, agentic pipelines required to process millions of historical documents. You will focus on transforming ML prototypes into scalable, cost-optimized, and resilient production systems on AWS infrastructure.
Title: Senior ML Systems Engineer
Location: Remote (Near shore to US)
Duration : 12 months
Key Responsibilities:
- Production Pipeline Engineering: Build and maintain scalable data pipelines that execute agentic workflows for entity extraction.
- Model Orchestration: Deploy and manage infrastructure for both open-source LLMs (on EC2) and third-party APIs, focusing on maximum throughput and reliability.
- Systems Optimization: Continuously optimize pipelines for **unit cost and processing speed** without compromising data integrity.
- Engineering Partnership: Act as a primary engineering contact for the Data Science team, operationalizing their trained models and prompt logic.
- Observability & Reliability: Implement advanced monitoring, alerting, and automated error-handling/retry logic for complex ML workflows.
- Infrastructure as Code: Manage the AWS ecosystem (EC2, Lambda, S3, IAM) supporting the ML lifecycle.
Required Technical Skills:
- Python (Systems Level): Advanced proficiency in building robust, multi-threaded, or asynchronous applications.
- Infrastructure as Code (IaC): Proficiency with **Terraform** or similar tools (e.g., Pulumi, CloudFormation) for provisioning and managing modular, version-controlled AWS infrastructure.
- AWS Infrastructure: Deep expertise in **EC2 Auto Scaling Groups**, **Lambda** orchestration, and **S3** data lakes.
- Agentic Orchestration: Experience building autonomous AI workflows using **AWS Agent Core** or similar frameworks.
- Operational Excellence: Proven track record in production monitoring (CloudWatch), logging, and troubleshooting distributed systems.
- Security: Strong grasp of AWS IAM for secure resource management.
Required Logistics: Able to work during US business hours for collaboration and production support.
Preferred Qualifications:
- MLOps Focus: Experience with CI/CD for ML, containerization (Docker), and automated testing of data pipelines.
- High-Volume Processing: Experience with OCR or document digitization at scale.
- Throughput Tuning: Experience managing rate limits and costs across multiple LLM providers.
What You'll Work On
- The Engine: Building the core "machinery" that digitizes and analyzes vast historical archives.
- Scale: Moving from "proof of concept" to processing thousands of documents concurrently.
- Resiliency: Building systems that can gracefully handle API outages, model timeouts, and malformed data.
Impact:
Your work will unlock valuable insights from historical archives, making previously inaccessible information searchable and analyzable for researchers, historians, and institutions worldwide.
We are an Equal Opportunity Employer