Vision-Language Model (VLM) Engineer

Hybrid • Posted 2 days ago • Updated 1 hour ago

Contract Independent

Contract W2

Remote

Fitment

Dice Job Match Score™

🛠️ Calibrating flux capacitors...

Job Details

Skills

Summary

Job Title: Vision-Language Model (VLM) Engineer

Location: Remote

Duration: 6+ Month Contract

Job Description:
We are looking for a Vision-Language Model (VLM) Engineer / Applied Scientist who can design, fine-tune, and deploy multimodal models that understand images/videos and text together. You will own the path from prototype to production, including AWS-based deployment for scalable, secure inference.

Key Responsibilities:

Build and adapt vision-language models (VLMs) for enterprise use-cases (visual inspection, safety monitoring, document/image understanding, workflow automation)

Fine-tune pretrained models for custom datasets using LoRA / QLoRA / adapters

Create pipelines for image/video ingestion model inference structured outputs (JSON, labels, alerts, summaries)

Deploy inference services on AWS with monitoring, scaling, and cost control

Optimize for performance and reliability (batching, quantization, caching, GPU utilization)

Run evaluation, error analysis, and continuous improvement using task-specific metrics

Partner with product and engineering teams to integrate VLM services into applications/APIs

Required Skills:

Strong hands-on experience with multimodal AI / Vision-Language Models

Proficiency in Python and PyTorch (or equivalent deep learning framework)

Real-world experience with fine-tuning and model adaptation (LoRA/QLoRA, prompt tuning)

Experience deploying ML services on AWS, such as:

Amazon SageMaker (endpoints, model hosting, pipelines)

Amazon EC2 + GPU, Auto Scaling, Load Balancers

Amazon ECR (container registry) + Docker

AWS Lambda / API Gateway (where suitable), CloudWatch (logs/metrics)

Strong understanding of computer vision fundamentals (classification, detection, embeddings)

Preferred / Nice to Have:

Hugging Face Transformers, OpenCV, ONNX/TensorRT

ECS / EKS (Kubernetes) for container orchestration

Infrastructure as Code: Terraform / AWS CDK / CloudFormation

Security best practices: IAM roles, VPC setup, secrets management

Multimodal RAG (Retrieval-Augmented Generation) with vector databases

Experience with dataset labeling workflows and MLOps practices

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10121915
Position Id: 2026-4341
Posted 2 days ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Vision-Language Model (VLM) Engineer

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs