Apply Now

ML Engineer

Hybrid in New York, NY, US • Posted 20 hours ago • Updated 20 hours ago

Contract Independent

Contract W2

6 Months

Occasional Travel Required

Hybrid

Depends on Experience

Fitment

Dice Job Match Score™

📋 Comparing job requirements...

Job Details

Skills

Machine Learning Operations (ML Ops)
Kubernetes
Machine Learning (ML)
Healthcare Information Technology
Extract, Transform, Load
Deep Learning
Data Processing
Continuous Delivery
Cloud Computing
Continuous Integration
Named-Entity Recognition (NER)
Natural Language Processing
Okapi BM25
PyTorch
Orchestration
Terraform
Debugging
Optimization
TensorFlow
Transformer
Open Source

Summary

What You’ll Do:

· Design, build, and scale ML-powered inference systems that process large volumes of text, image, and video data to power news-based intelligence products.

· Productionize and optimize state of the art models and inference pipelines. These models include, but are not limited to:

o DistilBERT for Named Entity Recognition (NER) over hundreds of thousands of search queries/day

o TransNetV2 for video shot boundary detection at scale for archival video as well as real-time

o SBERT for embedding generation from textual descriptions

o External multimodal APIs for image/video captioning

· Support hybrid search architectures by defining embedding/re-ranking interfaces, evaluation metrics, and inference performance requirements; partner with search/platform engineers on index configuration, sharding, and cluster tuning.

· Design and implement scalable data processing pipelines across hybrid CPU/GPU environments to handle millions of media assets.

· Partner with MLOps and platform engineering to enable the deployment and operation of ML systems reliably, contributing to:

o Distributed inference architectures

o Cloud-based execution (e.g., AWS EC2, Batch, Lambda, SageMaker)

o Efficient resource utilization across workloads

· Optimize inference latency and throughput across distributed workloads using cloud-based resources (AWS EC2, Batch, Lambda, SageMaker, etc.)

· Build resilient asynchronous processing systems for large-scale workloads, ensuring:

o Reliability (retries, fault tolerance)

o Efficiency (caching, deduplication)

o Observability (metrics, logging, traceability)

· Work closely with data scientists and product teams to iterate on models, improve performance, and deliver measurable impact in production.

Requirements:

· 8+ years of experience building production ML inference systems.

· Demonstrated ownership of deep-learning inference optimization in production (quantization, distillation, compilation, kernel/profile-level performance work) for transformer NLP and/or CV models.

· Experience with both TensorFlow (SavedModel, tf.data, XLA, TFLite) and PyTorch (TorchScript, ONNX, FastAPI/TorchServe)

· Hands-on experience optimizing inference pipelines on AWS infrastructure, ideally across different types of media assets.

· Experience with video frameworks/tools (e.g., FFmpeg), and working with large-scale frame-level inference.

· Demonstrated experience monitoring and debugging model latency, memory, and pipeline throughput.

· Experience with hybrid search architectures (BM25 + vector search + cross-encoder reranking).

· Familiarity with OpenAI APIs or other foundation model providers.

· Familiarity with open source HuggingFace LLMs.

· Experience with data pipeline and workflow orchestration tools (e.g., Airflow)

Who This Role is Not For:

Candidates whose primary background is MLOps platform work (DAG orchestration, Terraform, Kubernetes administration, generic CI/CD pipelines) will not be a fit. We need a senior level engineer who can profile a transformer, rewrite its serving path for a 2–3x latency reduction, tune an HNSW index, and tell us which SageMaker instance type will hit our p95 target at the lowest cost.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10121769
Position Id: 9000076
Posted 20 hours ago

Contact the job poster

Imran khan

Recruiter @ Central Business Solutions

View Profile

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

New York, New York

•

Today

Software Engineer, Machine Learning (MLOps & Data) A Career with Point72's Surveillance Team On the Knowledge Graph Intelligence team, you'll work alongside product managers, engineers, and data scientists to build the next generation of intelligent systems through graph technology. We're a team of experts who experiment and work to discover new ways to harness open-source solutions, modern cloud architectures, and sophisticated Artificial Intelligence (AI) solutions, while embracing enterpris

Full-time

USD 175,000.00 - 250,000.00 per year

Member of Technical Staff (AI Inference Engineer)

New York, New York

•

Today

We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer to join us. What you will work on Examples of real work the team does: New models support. Support transformer-based retrieval, text-generation, and multimodal models in our inference infrastructure, from weight loading, request scheduling and KV-cache management to

Full-time

USD 220,000.00 - 485,000.00 per year

MLOps Engineer - Machine Learning Platform - New York

Jersey City, New Jersey

•

Today

Job Description What We Do At Goldman Sachs, our Engineers don't just make things - we make things possible. Change the world by connecting people and capital with ideas. Solve the most challenging and pressing engineering problems for our clients. Join our engineering teams that build massively scalable software and systems, architect low latency infrastructure solutions, proactively guard against cyber threats, and leverage machine learning alongside financial engineering to continuously turn

Full-time

Senior Software Engineer - AI Inference

New York, New York

•

Today

Description & Requirements Our team: Join the team that is building the core infrastructure for AI at Bloomberg. The Bloomberg AI Inference Platform provides production-grade managed infrastructure for hosting, deploying, and serving all machine learning models, both predictive and cutting-edge generative models. We abstract away infrastructure complexity, empowering engineering teams to focus on creating intelligent applications with guaranteed scalability, performance, and governance. Our pla

Full-time

USD 160,000.00 - 240,000.00 per year

Search all similar jobs

ML Engineer

Dice Job Match Score™

Job Details

Skills

Summary

Imran khan

Similar Jobs