Job Title: Vision-Language Model (VLM) Engineer
Location: Remote
Duration: 6+ Month Contract
Job Description:
We are looking for a Vision-Language Model (VLM) Engineer / Applied Scientist who can design, fine-tune, and deploy multimodal models that understand images/videos and text together. You will own the path from prototype to production, including AWS-based deployment for scalable, secure inference.
Key Responsibilities:
Build and adapt vision-language models (VLMs) for enterprise use-cases (visual inspection, safety monitoring, document/image understanding, workflow automation)
Fine-tune pretrained models for custom datasets using LoRA / QLoRA / adapters
Create pipelines for image/video ingestion model inference structured outputs (JSON, labels, alerts, summaries)
Deploy inference services on AWS with monitoring, scaling, and cost control
Optimize for performance and reliability (batching, quantization, caching, GPU utilization)
Run evaluation, error analysis, and continuous improvement using task-specific metrics
Partner with product and engineering teams to integrate VLM services into applications/APIs
Required Skills:
Strong hands-on experience with multimodal AI / Vision-Language Models
Proficiency in Python and PyTorch (or equivalent deep learning framework)
Real-world experience with fine-tuning and model adaptation (LoRA/QLoRA, prompt tuning)
Experience deploying ML services on AWS, such as:
Amazon SageMaker (endpoints, model hosting, pipelines)
Amazon EC2 + GPU, Auto Scaling, Load Balancers
Amazon ECR (container registry) + Docker
AWS Lambda / API Gateway (where suitable), CloudWatch (logs/metrics)
Strong understanding of computer vision fundamentals (classification, detection, embeddings)
Preferred / Nice to Have:
Hugging Face Transformers, OpenCV, ONNX/TensorRT
ECS / EKS (Kubernetes) for container orchestration
Infrastructure as Code: Terraform / AWS CDK / CloudFormation
Security best practices: IAM roles, VPC setup, secrets management
Multimodal RAG (Retrieval-Augmented Generation) with vector databases
Experience with dataset labeling workflows and MLOps practices