AI Data Engineer

Cupertino, CA, US • Posted 2 days ago • Updated 1 day ago
Full Time
On-site
Fitment

Dice Job Match Score™

🫥 Flibbertigibetting...

Job Details

Skills

  • Data Flow
  • Embedded Systems
  • Partnership
  • Collaboration
  • Graph Databases
  • Meta-data Management
  • Normalization
  • Unstructured Data
  • Analytics
  • Taxonomy
  • FOCUS
  • Computer Science
  • Data Science
  • Information Systems
  • SQL
  • Python
  • Data Engineering
  • Cloud Computing
  • Snow Flake Schema
  • Databricks
  • Extract
  • Transform
  • Load
  • ELT
  • Data Modeling
  • Version Control
  • Git
  • Continuous Integration
  • Continuous Delivery
  • Artificial Intelligence
  • Machine Learning (ML)
  • Vector Databases
  • MongoDB
  • Microsoft Certified Professional
  • Data Quality
  • Semantics
  • Neo4j
  • Ontologies
  • Streaming
  • Apache Kafka
  • Legal
  • Management
  • Contract Lifecycle Management
  • Document Management

Summary

Imagine what you could do here. At Apple, new ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish.\\n\\nAre you passionate about building the data pipelines that make AI systems fast, accurate, and reliable?\\nDo you thrive on engineering clean data flows that connect enterprise systems to intelligent applications?\\nCan you build infrastructure that's both production-grade and purpose-built for AI consumption?\\n\\nThe Applied Data Science team within Legal Operations is building production-grade AI for a global legal organization - and every AI system is only as good as the data flowing into it. The AI Data Engineer owns the pipelines, data feeds, and integration infrastructure that ensure AI applications have the right data, in the right form, at the right time.\\n

The AI Data Engineer builds and maintains the data infrastructure that powers AI applications across Legal Operations. You will design and implement data pipelines that ingest from legal systems, transform data into AI-ready formats, load vector databases and other AI stores, and expose data services through APIs. This role is embedded within the AI team and works in close partnership with AI and data colleagues to ensure AI systems have reliable, high-quality data at every stage.\n\n \tDesign and implement data pipelines that ingest, transform, and deliver data from legal systems (matter management, eBilling, CLM, document management) to AI applications\n \tBuild and maintain pipelines that load and refresh vector databases, document stores, and graph databases used by AI retrieval systems\n \tEngineer data transformations that prepare legal data for AI consumption - chunking, embedding generation, metadata enrichment, and schema normalization\n \tBuild upstream and downstream integrations with MCP (Model Context Protocol), vector databases, and knowledge graphs to support context engineering and AI retrieval systems\n \tDevelop and maintain APIs that expose structured and unstructured data to AI applications and analytics tools\n \tImplement data quality checks and validation at pipeline ingestion points to ensure AI systems receive reliable, complete data\n \tBuild monitoring and alerting for pipeline health, data freshness, and load failures\n \tUnderstand AI data access patterns and optimize data delivery for AI performance\n \tIntegrate with the semantic layer - consuming entity resolution outputs, taxonomy mappings, and enriched datasets to ground AI applications\n \tImplement ETL/ELT processes using dbt, Fivetran, or similar tools with a focus on reliability and maintainability\n \tDocument pipeline designs, data contracts, and operational runbooks\n

Bachelor's degree in Computer Science, Data Science, Information Systems, or related field (or equivalent experience); Master's degree preferred\n4+ years of experience in data engineering related to AI application\nStrong proficiency in SQL and Python for data engineering and transformation\nExperience with cloud data platforms (Snowflake, Databricks, BigQuery, or similar)\nExperience with ETL/ELT tools (dbt, Fivetran, Airflow, or similar)\nExperience building and maintaining REST APIs\nUnderstanding of data modeling and data transformation best practices\nExperience with version control (Git) and CI/CD practices\nAbility to work closely with AI/ML teams and understand their data requirements\n

Experience with vector databases (Pinecone, Weaviate, Chroma), embedding generation pipelines, document stores (MongoDB or similar) and their integration patterns\n\nUnderstanding of RAG, MCP architectures, context engineering principles, and how data quality affects retrieval performance\n\nExperience with semantic layer technologies (dbt Semantic Layer, Cube, AtScale), knowledge graphs (Neo4j), or ontology design\n\nExperience with streaming or event-driven data architectures (Kafka or similar)\n\nFamiliarity with legal operations data (matter management, eBilling, CLM, document management)\n
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 90733111
  • Position Id: b8723362766d5f60fc4733e860fb5711
  • Posted 2 days ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

San Jose, California

Yesterday

Full-time

USD 139,000.00 per year

Sunnyvale, California

Today

Easy Apply

Full-time

USD 55.00 - 60.00 per hour

San Jose, California

Yesterday

Full-time

USD 159,200.00 per year

San Jose, California

Yesterday

Full-time

USD 130,900.00 - 194,700.00 per year

Search all similar jobs