DATA ARCHITECT

Overview

Hybrid
Depends on Experience
Accepts corp to corp applications
Contract - W2
Contract - Independent
Contract - 1 Year(s)

Skills

data architect
sap data structures
data brick
SAP IBP
APO
MDM
ADF
Synapse
Data Factory
AWS Glue
SAP connectors
ODP/ODQ
RFC/IDOC/BAPI

Job Details

DATA ARCHITECT

Location :California Bay Area

1+Year

Role expectations

The Data Architect will be responsible for designing, implementing, and maintaining scalable data architectures on the Databricks platform with a strong understanding of SAP data structures, especially master data. The role requires hands-on experience in data engineering, governance, and platform administration, as well as the ability to guide development teams through best practices, architecture decisions, and code reviews.

Skills

Technical Skills

  • 8 15+ years in data engineering/architecture, with 3 5+ years specifically in Databricks.
  • Deep knowledge of:
    • PySpark, Spark SQL, Delta Lake
    • Unity Catalog, cluster management, Lakehouse governance
    • Azure/AWS/Google Cloud Platform cloud architecture
  • Strong experience with SAP data:
    • Extracting data from ECC/S4/BW
    • Understanding of SAP tables, master data structures, and business logic
    • Experience with IDOCs, BAPIs, ODP/ODQ sources
  • Strong MDM experience:
    • Master data modelling
    • Data quality frameworks
    • Metadata management
    • Golden record management
  • CI/CD: Git, Azure DevOps, GitHub Actions or similar.
  • Databricks Workflows / Jobs orchestration.
  • Exposure to planning systems such as SAP IBP/APO (preferred but not required).

Soft Skills

  • Strong communication and documentation skills.
  • Ability to interact with business and technical teams.
  • Problem-solving with a focus on performance, reliability, and scalability.
  • Leadership mindset with ability to guide and upskill teams.

Detailed skills

Architecture & Solution Design

  • Design end-to-end data architectures leveraging Databricks Lakehouse Platform (Delta Lake, Unity Catalog, Lakehouse Governance).
  • Develop scalable ingestion, transformation, and consumption patterns for SAP data (ECC/S4, BW, IBP, APO, etc.).
  • Define data models for Master Data Management (MDM) Material, Customer, Vendor, BOM, Plant, Cost Center, Profit Center, etc.
  • Create logical/physical models aligned with business processes (planning, procurement, manufacturing, finance).
  1. Databricks Platform Administration
  • Manage workspace configuration, clusters, secrets, networking, and access control.
  • Set up and maintain Unity Catalog, catalogs, schemas, storage credentialing, and data lineage.
  • Develop CI/CD frameworks for Databricks repos, workflows, and environment promotions.
  • Monitor platform performance, optimize cluster sizing, and implement cost-control measures.
  1. Infrastructure & Environment Setup
  • Design and configure environments (Dev/Test/Prod) across Azure/AWS/Google Cloud Platform Databricks.
  • Set up pipelines for SAP data ingestion using ADF, Synapse, Data Factory, AWS Glue, SAP connectors, ODP/ODQ, RFC/IDOC/BAPI mechanisms.
  • Architect secure storage layers (Bronze/Silver/Gold) with Delta Lake best practices.
  • Ensure integration with enterprise security standards Key Vaults, ADLS/S3, IAM, networking.
  1. Data Governance & MDM
  • Implement governance frameworks around data quality, lineage, cataloging, and stewardship.
  • Define master data validations, deduplication logic, survivorship rules, and versioning.
  • Implement data quality rules using Delta Live Tables (DLT), expectations, and audits.
  • Collaborate with business teams to define golden records and standardized master data models.
  1. Best Practices, Standards & Reviews
  • Create coding standards for PySpark, SQL, Delta Lake, and ETL/ELT pipelines.
  • Review developer code with focus on:
    • Query optimization
    • Efficient Delta Lake operations (MERGE, OPTIMIZE, ZORDER)
    • Cluster cost optimization
    • Error handling and logging patterns
  • Define reusable frameworks for ingestion, transformation, and reconciliation.
  1. Development Guidance & Team Enablement
  • Mentor developers on Databricks architecture, PySpark patterns, and SAP data structures.
  • Provide technical leadership in design sessions and sprint planning.
  • Conduct knowledge sessions on best practices and common pitfalls.
  • Troubleshoot complex data pipeline issues across SAP Databricks
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.