Apply Now

AI Data Engineer

Cupertino, CA, US • Posted 30+ days ago • Updated 8 hours ago

Full Time

On-site

Fitment

Dice Job Match Score™

👾 Reticulating splines...

Job Details

Skills

Data Flow
Embedded Systems
Partnership
Collaboration
Graph Databases
Meta-data Management
Normalization
Unstructured Data
Analytics
Taxonomy
FOCUS
Computer Science
Data Science
Information Systems
SQL
Python
Data Engineering
Cloud Computing
Snow Flake Schema
Databricks
Extract
Transform
Load
ELT
Data Modeling
Version Control
Git
Continuous Integration
Continuous Delivery
Artificial Intelligence
Machine Learning (ML)
Vector Databases
MongoDB
Microsoft Certified Professional
Data Quality
Semantics
Neo4j
Ontologies
Streaming
Apache Kafka
Legal
Management
Contract Lifecycle Management
Document Management

Summary

Imagine what you could do here. At Apple, new ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish.

Are you passionate about building the data pipelines that make AI systems fast, accurate, and reliable?

Do you thrive on engineering clean data flows that connect enterprise systems to intelligent applications?

Can you build infrastructure that's both production-grade and purpose-built for AI consumption?

The Applied Data Science team within Legal Operations is building production-grade AI for a global legal organization - and every AI system is only as good as the data flowing into it. The AI Data Engineer owns the pipelines, data feeds, and integration infrastructure that ensure AI applications have the right data, in the right form, at the right time.

Description

The AI Data Engineer builds and maintains the data infrastructure that powers AI applications across Legal Operations. You will design and implement data pipelines that ingest from legal systems, transform data into AI-ready formats, load vector databases and other AI stores, and expose data services through APIs. This role is embedded within the AI team and works in close partnership with AI and data colleagues to ensure AI systems have reliable, high-quality data at every stage.

Design and implement data pipelines that ingest, transform, and deliver data from legal systems (matter management, eBilling, CLM, document management) to AI applications

Build and maintain pipelines that load and refresh vector databases, document stores, and graph databases used by AI retrieval systems

Engineer data transformations that prepare legal data for AI consumption - chunking, embedding generation, metadata enrichment, and schema normalization

Build upstream and downstream integrations with MCP (Model Context Protocol), vector databases, and knowledge graphs to support context engineering and AI retrieval systems

Develop and maintain APIs that expose structured and unstructured data to AI applications and analytics tools

Implement data quality checks and validation at pipeline ingestion points to ensure AI systems receive reliable, complete data

Build monitoring and alerting for pipeline health, data freshness, and load failures

Understand AI data access patterns and optimize data delivery for AI performance

Integrate with the semantic layer - consuming entity resolution outputs, taxonomy mappings, and enriched datasets to ground AI applications

Implement ETL/ELT processes using dbt, Fivetran, or similar tools with a focus on reliability and maintainability

Document pipeline designs, data contracts, and operational runbooks

Minimum Qualifications

Bachelor's degree in Computer Science, Data Science, Information Systems, or related field (or equivalent experience); Master's degree preferred

4+ years of experience in data engineering related to AI application

Strong proficiency in SQL and Python for data engineering and transformation

Experience with cloud data platforms (Snowflake, Databricks, BigQuery, or similar)

Experience with ETL/ELT tools (dbt, Fivetran, Airflow, or similar)

Experience building and maintaining REST APIs

Understanding of data modeling and data transformation best practices

Experience with version control (Git) and CI/CD practices

Ability to work closely with AI/ML teams and understand their data requirements

Preferred Qualifications

Experience with vector databases (Pinecone, Weaviate, Chroma), embedding generation pipelines, document stores (MongoDB or similar) and their integration patterns

Understanding of RAG, MCP architectures, context engineering principles, and how data quality affects retrieval performance

Experience with semantic layer technologies (dbt Semantic Layer, Cube, AtScale), knowledge graphs (Neo4j), or ontology design

Experience with streaming or event-driven data architectures (Kafka or similar)

Familiarity with legal operations data (matter management, eBilling, CLM, document management)

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 90733111
Position Id: b8723362766d5f60fc4733e860fb5711
Posted 30+ days ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Menlo Park, California

•

25d ago

Our client, a leading tech company, is looking to hire a AI Data Engineer in Menlo Park, CA. Pay Rate Range:$95/Hr to $100/Hr, depending on experience Description: Generative AI models are only as good as the data they consume. Unlike traditional data engineering, building data pipelines for generative AI require orchestrating ML model invocations (content understanding classifiers, embedding models, LLM-based cleaners) alongside standard SQL-based transformations, all at billion-row scale. T

Easy Apply

Contract, Third Party

Depends on Experience

Senior Data Engineer

San Jose, California

•

Today

The Opportunity Firefly is Adobe's family of creative generative AI products - and it's growing fast. Our data team sits at the center of that growth, powering the analytics, modeling, and product decisions that build what Firefly becomes. We're hiring a Senior Data Engineer to improve our data foundation. This person not only builds pipelines but also plans them carefully. They take action on needs they see without waiting for instructions. This role will have immediate and visible impact in mu

Full-time

USD 133,100.00 - 236,400.00 per year

Data Engineer III

Menlo Park, California

•

18d ago

Start/End Dates: 7/13/2026 - 12/31/2026 Tax Work Location: US - CA - Menlo Park (105201) Job Title: Data Analytics & Engineering - Data Engineer Job Description: Summary Generative AI models are only as good as the data they consume. Unlike traditional data engineering, building data pipelines for generative AI requires orchestrating ML model invocations (content understanding classifiers, embedding models, LLM-based cleaners) alongside standard SQL-based transformations, all at billion-row

Easy Apply

Contract, Third Party

Depends on Experience

Data Science Engineer

San Jose, California

•

Today

Our company Changing the world through digital experiences is what Adobe's all about. We give everyone-from emerging artists to global brands-everything they need to build and deliver outstanding digital experiences! We're passionate about empowering people to build beautiful and powerful images, videos, and apps, and transform how companies interact with customers across every screen. We're on a mission to hire the very best. We are committed to crafting outstanding employee experiences where

Full-time

USD 109,000.00 per year

Search all similar jobs

More jobs at Apple, Inc. in Cupertino, CA

AI Data Engineer

Dice Job Match Score™

Job Details

Skills

Summary

Similar Jobs