Overview
Skills
Job Details
Job role: Senior ML Platform Engineer (Serving Infrastructure)
Location: Remote
Duration: Long term
Role Overview:
We're looking for an experienced engineer to build our ML serving infrastructure.
You'll create the platforms and systems that enable reliable, scalable model deployment and inference.
This role focuses on the runtime infrastructure that powers our production ML capabilities.
Key Responsibilities:
Design and implement scalable model serving platforms for both batch and real-time inference
Build model deployment pipelines with automated testing and validation
Develop monitoring, logging, and alerting systems for ML services
Create infrastructure for A/B testing and model experimentation
Implement model versioning and rollback capabilities
Design efficient scaling and load balancing strategies for ML workloads
Collaborate with data scientists to optimize model serving performance
Technical Requirements:
10+ years of software engineering experience, with 7+ years in ML serving/infrastructure
Strong expertise in container orchestration (Kubernetes) and cloud platforms
Experience with model serving technologies (TensorFlow Serving, Triton, KServe)
Deep knowledge of distributed systems and microservices architecture
Proficiency in Python and experience with high-performance serving
Strong background in monitoring and observability tools
Experience with CI/CD pipelines and GitOps workflows
Experience with model serving frameworks:
TorchServe for PyTorch models
TensorFlow Serving for TF models
Triton Inference Server for multi-framework support
BentoML for unified model serving
Expertise in model runtime optimizations:
Model quantization (INT8, FP16)
Model pruning and compression
Kernel optimizations
Batching strategies
Hardware-specific optimizations (CPU/GPU)
Experience with model inference workflows:
Pre/post-processing pipeline optimization
Feature transformation at serving time
Caching strategies for inference
Multi-model inference orchestration
Dynamic batching and request routing
Experience with GPU infrastructure management
Knowledge of low-latency serving architectures
Familiarity with ML-specific security requirements
Background in performance profiling and optimization
Experience with model serving metrics collection and analysis