Senior ML Platform Engineer (Serving Infrastructure) - 100% Remote

Overview

Remote
$50+
Contract - W2
Contract - Independent

Skills

Senior ML Platform Engineer (Serving Infrastructure)

Job Details

Job role: Senior ML Platform Engineer (Serving Infrastructure)

Location: Remote

Duration: Long term

Role Overview:

We're looking for an experienced engineer to build our ML serving infrastructure.

You'll create the platforms and systems that enable reliable, scalable model deployment and inference.

This role focuses on the runtime infrastructure that powers our production ML capabilities.

Key Responsibilities:

Design and implement scalable model serving platforms for both batch and real-time inference

Build model deployment pipelines with automated testing and validation

Develop monitoring, logging, and alerting systems for ML services

Create infrastructure for A/B testing and model experimentation

Implement model versioning and rollback capabilities

Design efficient scaling and load balancing strategies for ML workloads

Collaborate with data scientists to optimize model serving performance

Technical Requirements:

10+ years of software engineering experience, with 7+ years in ML serving/infrastructure

Strong expertise in container orchestration (Kubernetes) and cloud platforms

Experience with model serving technologies (TensorFlow Serving, Triton, KServe)

Deep knowledge of distributed systems and microservices architecture

Proficiency in Python and experience with high-performance serving

Strong background in monitoring and observability tools

Experience with CI/CD pipelines and GitOps workflows

Experience with model serving frameworks:

TorchServe for PyTorch models

TensorFlow Serving for TF models

Triton Inference Server for multi-framework support

BentoML for unified model serving

Expertise in model runtime optimizations:

Model quantization (INT8, FP16)

Model pruning and compression

Kernel optimizations

Batching strategies

Hardware-specific optimizations (CPU/GPU)

Experience with model inference workflows:

Pre/post-processing pipeline optimization

Feature transformation at serving time

Caching strategies for inference

Multi-model inference orchestration

Dynamic batching and request routing

Experience with GPU infrastructure management

Knowledge of low-latency serving architectures

Familiarity with ML-specific security requirements

Background in performance profiling and optimization

Experience with model serving metrics collection and analysis

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Floga technologies