DATA ARCHITECT

Overview

Hybrid

Depends on Experience

Accepts corp to corp applications

Contract - W2

Contract - Independent

Contract - 1 Year(s)

Skills

data architect

sap data structures

data brick

SAP IBP

APO

MDM

ADF

Synapse

Data Factory

AWS Glue

SAP connectors

ODP/ODQ

RFC/IDOC/BAPI

Job Details

DATA ARCHITECT

Location :California Bay Area

1+Year

Role expectations

The Data Architect will be responsible for designing, implementing, and maintaining scalable data architectures on the Databricks platform with a strong understanding of SAP data structures, especially master data. The role requires hands-on experience in data engineering, governance, and platform administration, as well as the ability to guide development teams through best practices, architecture decisions, and code reviews.

Skills

Technical Skills

8 15+ years in data engineering/architecture, with 3 5+ years specifically in Databricks.
Deep knowledge of:

PySpark, Spark SQL, Delta Lake
Unity Catalog, cluster management, Lakehouse governance
Azure/AWS/Google Cloud Platform cloud architecture

Strong experience with SAP data:

Extracting data from ECC/S4/BW
Understanding of SAP tables, master data structures, and business logic
Experience with IDOCs, BAPIs, ODP/ODQ sources

Strong MDM experience:

Master data modelling
Data quality frameworks
Metadata management
Golden record management

CI/CD: Git, Azure DevOps, GitHub Actions or similar.
Databricks Workflows / Jobs orchestration.
Exposure to planning systems such as SAP IBP/APO (preferred but not required).

Soft Skills

Strong communication and documentation skills.
Ability to interact with business and technical teams.
Problem-solving with a focus on performance, reliability, and scalability.
Leadership mindset with ability to guide and upskill teams.

Detailed skills

Architecture & Solution Design

Design end-to-end data architectures leveraging Databricks Lakehouse Platform (Delta Lake, Unity Catalog, Lakehouse Governance).
Develop scalable ingestion, transformation, and consumption patterns for SAP data (ECC/S4, BW, IBP, APO, etc.).
Define data models for Master Data Management (MDM) Material, Customer, Vendor, BOM, Plant, Cost Center, Profit Center, etc.
Create logical/physical models aligned with business processes (planning, procurement, manufacturing, finance).

Databricks Platform Administration

Manage workspace configuration, clusters, secrets, networking, and access control.
Set up and maintain Unity Catalog, catalogs, schemas, storage credentialing, and data lineage.
Develop CI/CD frameworks for Databricks repos, workflows, and environment promotions.
Monitor platform performance, optimize cluster sizing, and implement cost-control measures.

Infrastructure & Environment Setup

Design and configure environments (Dev/Test/Prod) across Azure/AWS/Google Cloud Platform Databricks.
Set up pipelines for SAP data ingestion using ADF, Synapse, Data Factory, AWS Glue, SAP connectors, ODP/ODQ, RFC/IDOC/BAPI mechanisms.
Architect secure storage layers (Bronze/Silver/Gold) with Delta Lake best practices.
Ensure integration with enterprise security standards Key Vaults, ADLS/S3, IAM, networking.

Data Governance & MDM

Implement governance frameworks around data quality, lineage, cataloging, and stewardship.
Define master data validations, deduplication logic, survivorship rules, and versioning.
Implement data quality rules using Delta Live Tables (DLT), expectations, and audits.
Collaborate with business teams to define golden records and standardized master data models.

Best Practices, Standards & Reviews

Create coding standards for PySpark, SQL, Delta Lake, and ETL/ELT pipelines.
Review developer code with focus on:

Query optimization
Efficient Delta Lake operations (MERGE, OPTIMIZE, ZORDER)
Cluster cost optimization
Error handling and logging patterns

Define reusable frameworks for ingestion, transformation, and reconciliation.

Development Guidance & Team Enablement

Mentor developers on Databricks architecture, PySpark patterns, and SAP data structures.
Provide technical leadership in design sessions and sprint planning.
Conduct knowledge sessions on best practices and common pitfalls.
Troubleshoot complex data pipeline issues across SAP Databricks

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share