Apply Now

Opportunity for AI Data Engineer - Menlo Park, CA

Menlo Park, CA, US • Posted 4 hours ago • Updated 4 hours ago

Contract Corp To Corp

Contract W2

6 Months

On-site

Depends on Experience

Fitment

Dice Job Match Score™

⭐ Evaluating experience...

Job Details

Skills

AI DATA ENGINEER
DATA ENGINEER
ML ENGINEER
AIRFLOW
DATASWARM
PIPELINE ORCHESTRATION
SQL
QUERY OPTIMIZATION
COMPLEX QUERIES
ML MODEL INTEGRATION
MODEL INVOCATION
MODEL SERVING
LLMS
PROMPT ENGINEERING
LLM APIS
GENERATIVE AI
DIFFUSION MODELS
IMAGE GENERATION

Summary

Our client, a leading tech company, is looking to hire a AI Data Engineer in Menlo Park, CA.

Pay Rate Range: $95/Hr to $100/Hr, depending on experience

Description:
Generative AI models are only as good as the data they consume. Unlike traditional data engineering, building data pipelines for generative AI require orchestrating ML model invocations (content understanding classifiers, embedding models, LLM-based cleaners) alongside standard SQL-based transformations, all at billion-row scale.

This role sits at the intersection of Data Engineering and ML Systems. The Senior AI Data Engineer will own end-to-end data pipelines that don''t just move and transform data, but enrich it through remote model inference, managing the systems complexity of async execution, capacity allocation, retry/fallback logic, and throughput optimization that comes with it. This is not a pure ETL-with-SQL role; it demands hands-on systems experience with distributed inference infrastructure.

Our team develops comprehensive data curation and evaluation solutions for image generation models across quality dimensions including visual quality, prompt adherence, identity preservation, naturalness, and visual text generation.

Job Responsibilities
AI-Augmented Data Pipelines: Design and maintain AI-augmented, large-scale data pipelines (billions of images) integrating traditional transformations with ML models (classifiers, embeddings, LLMs) for cleaning and annotation.
Remote Inference Orchestration: Own the systems for remote ML model inference orchestration within pipelines, managing batching, retries, async jobs, and ensuring graceful degradation.
Feature Pipelines: Build and maintain scalable pipelines for generating, storing, and serving vector embeddings, including nearest-neighbor index management and quality validation.
Data Curation at Scale: Source, filter, and curate training datasets using a combination of SQL and model-derived signals (e.g., aesthetic scores, NSFW classifiers), owning the end-to-end data flow and maintaining governance, quality, and compliance.

Additional Responsibilities
LLM-Assisted Annotation: Design and operate pipelines that use LLMs and vision models for automated annotation of training data, including auditing workflows to measure and improve annotation model performance.
Tooling & Frameworks: Contribute to shared tooling and frameworks that make it easier for the broader team to build AI-augmented data pipelines — e.g., reusable operators for model invocation, standard patterns for async job management.

Skills Required
Advanced SQL & data pipeline expertise. Complex queries, query optimization, pipeline orchestration frameworks (Airflow, Dataswarm, or equivalent).
Experience integrating ML models into data pipelines. Calling inference endpoints, managing model versions, batching requests, handling inference failures at scale.
Proficiency with AI-assisted coding agents (e.g., Copilot, Cursor, Codex). Expected to leverage AI tools as a force multiplier for writing, debugging, and reviewing code, building pipelines faster, and accelerating day-to-day engineering workflows
Strong verbal and written communication skills, problem-solving ability, and cross-functional collaboration.

Preferred
Working knowledge of embeddings and vector representations like generating, storing, indexing, and querying embeddings (FAISS, Milvus, or equivalent).
Familiarity with content-understanding models like image classifiers, object detection, OCR, NSFW detection, aesthetic scoring.
Experience with LLMs for data tasks like prompt engineering for annotation, data cleaning, or evaluation using LLM APIs.
Knowledge of generative AI like diffusion models, image generation, evaluation metrics (FID, CLIP score, etc.).

Education / Experience
Bachelor''s degree or higher in Computer Science, Data Engineering, Machine Learning, or a related STEM field.
5+ years of industry experience in data engineering, ML engineering, or a hybrid role involving both data pipelines and model serving/inference.
Demonstrated track record of building and operating production data pipelines that invoke ML models at scale.

Russell Tobin offers eligible employee’s comprehensive healthcare coverage (medical, dental, and vision plans), supplemental coverage (accident insurance, critical illness insurance and hospital indemnity), 401(k)-retirement savings, life & disability insurance, an employee assistance program, legal support, auto, home insurance, pet insurance and employee discounts with preferred vendors.

Equal Employment Opportunity
Russell Tobin is an equal opportunity employer. We do not discriminate on the basis of the race, religious creed, color, national origin, ancestry, physical disability, mental disability, reproductive health decision making, medical condition, genetic information, marital status, sex, gender, gender identity, gender expression, age, sexual orientation, veteran or military status, or any other characteristic protected by applicable federal, state, or local law.

Fair Chance Employment
Russell Tobin is a Fair Chance employer. We consider all qualified applicants, including those with criminal histories, in a manner consistent with applicable state and local Fair Chance laws and ordinances, including, the California Fair Chance Act and all applicable local Fair Chance ordinances.

Accommodations
We are committed to providing reasonable accommodations to applicants and employees with disabilities. If you require a reasonable accommodation to participate in the application or interview process, or to perform the essential functions of this role, please contact us.

Only applicable for San Francisco Candidates: Under the San Francisco Lactation in the Workplace Ordinance, we will provide written notice of lactation accommodation rights, and this notice will automatically be given upon hiring, any inquiry of parental leave or lactation accommodation.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 10427670
Position Id: 26-17051
Posted 4 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Data Engineer

Menlo Park, California

•

Today

Summary: Generative AI models are only as good as the data they consume. Unlike traditional data engineering, building data pipelines for generative AI requires orchestrating ML model invocations (content understanding classifiers, embedding models, LLM-based cleaners) alongside standard SQL-based transformations, all at billion-row scale.This role sits at the intersection of Data Engineering and ML Systems. The Senior AI Data Engineer will own end-to-end data pipelines that don't just move and

Easy Apply

Contract

$70 - $75

Data Engineer III 70756-1

Menlo Park, California

•

Today

Request ID: 70756-1 Start/End Dates: 7/13/2026 - 12/31/2026 Tax Work Location: US - CA - Menlo Park (105201) Job Title: Data Analytics & Engineering - Data Engineer Job Description: Summary Generative AI models are only as good as the data they consume. Unlike traditional data engineering, building data pipelines for generative AI requires orchestrating ML model invocations (content understanding classifiers, embedding models, LLM-based cleaners) alongside standard SQL-based transformations, a

Easy Apply

Contract, Third Party

$60

AI DataOps Engineer-Cupertino, CA-Onsite

Cupertino, California

•

Today

Infosys/Apple- AI DataOps Engineer-Cupertino, CA-Onsite Role-AI DataOps Engineer Location : Cupertino, CA-Onsite/Hybrid Duration :12+ Months Stack: AI , LLMs , Automation with Python, CI/CD Pipelines We are looking for a AI DataOps Engineer/ AI Automation Engineer at Apple Maps who can bridge AI/ML development with operational data pipelines. We are looking for a AI DataOps Engineer/ AI Automation Engineer at Apple Maps who can bridge AI/ML development with operational data pipelines - Builds

Easy Apply

Third Party, Contract

Depends on Experience

AI Engineer

Sunnyvale, California

•

Today

Title:AI/ML EngineerLocation: Sunnyvale, CANote: W2Visas: , L2, and GCRequired Skills & Qualifications:Hands-on experience inAI/ML developmentStrong programming skills in PythonExperience with ML frameworks like TensorFlow, PyTorch, or Scikit-learnSolid understanding of ML algorithms (regression, classification, clustering, etc.)Experience with deep learning architectures (CNNs, RNNs, Transformers)Knowledge of NLP or computer vision (depending on role focus)Familiarity with SQL and data processi

Easy Apply

Contract

Depends on Experience

Search all similar jobs