Overview
On Site
0.0
Contract - W2
Skills
Data Engineering
Apache Spark
Computer Cluster Management
Cloud Architecture
Modeling
Meta-data Management
Records Management
Git
DevOps
GitHub
Orchestration
Communication
Documentation
Conflict Resolution
Problem Solving
Scalability
Leadership
Software Design
SAP ERP
SAP BI
SAP APO
Bill Of Materials
Business Process
Procurement
Manufacturing
Finance
Management
Access Control
Unity
Storage
Continuous Integration
Continuous Delivery
Workflow
Promotions
Cost Control
Microsoft Azure
Google Cloud Platform
Google Cloud
ADF
Amazon Web Services
RFC
IDoc
BAPI
Amazon S3
Computer Networking
Data Governance
Mobile Device Management
Data Deduplication
Data Quality
Auditing
Collaboration
Master Data Management
SQL
ELT
FOCUS
Query Optimization
Optimization
Mentorship
PySpark
Data Structure
IT Management
Sprint
Extract
Transform
Load
SAP
Databricks
Job Details
Hi,
Job Requisition: Data Architect
Location: California Bay Area, CA (Available to go to office as required)
Role expectations
The Data Architect will be responsible for designing, implementing, and maintaining scalable data architectures on the Databricks platform with a strong understanding of SAP data structures, especially master data. The role requires hands-on experience in data engineering, governance, and platform administration, as well as the ability to guide development teams through best practices, architecture decisions, and code reviews.
Skills
Technical Skills
815+ years in data engineering/architecture, with 35+ years specifically in Databricks.
Deep knowledge of:
o PySpark, Spark SQL, Delta Lake
o Unity Catalog, cluster management, Lakehouse governance
o Azure/AWS/Google Cloud Platform cloud architecture
Strong experience with SAP data:
o Extracting data from ECC/S4/BW
o Understanding of SAP tables, master data structures, and business logic
o Experience with IDOCs, BAPIs, ODP/ODQ sources
Strong MDM experience:
o Master data modelling
o Data quality frameworks
o Metadata management
o Golden record management
CI/CD: Git, Azure DevOps, GitHub Actions or similar.
Databricks Workflows / Jobs orchestration.
Exposure to planning systems such as SAP IBP/APO (preferred but not required).
Soft Skills
Strong communication and documentation skills.
Ability to interact with business and technical teams.
Problem-solving with a focus on performance, reliability, and scalability.
Leadership mindset with ability to guide and upskill teams.
Detailed skills
Architecture & Solution Design
Design end-to-end data architectures leveraging Databricks Lakehouse Platform (Delta Lake, Unity Catalog, Lakehouse Governance).
Develop scalable ingestion, transformation, and consumption patterns for SAP data (ECC/S4, BW, IBP, APO, etc.).
Define data models for Master Data Management (MDM) Material, Customer, Vendor, BOM, Plant, Cost Center, Profit Center, etc.
Create logical/physical models aligned with business processes (planning, procurement, manufacturing, finance).
2. Databricks Platform Administration
Manage workspace configuration, clusters, secrets, networking, and access control.
Set up and maintain Unity Catalog, catalogs, schemas, storage credentialing, and data lineage.
Develop CI/CD frameworks for Databricks repos, workflows, and environment promotions.
Monitor platform performance, optimize cluster sizing, and implement cost-control measures.
3. Infrastructure & Environment Setup
Design and configure environments (Dev/Test/Prod) across Azure/AWS/Google Cloud Platform Databricks.
Set up pipelines for SAP data ingestion using ADF, Synapse, Data Factory, AWS Glue, SAP connectors, ODP/ODQ, RFC/IDOC/BAPI mechanisms.
Architect secure storage layers (Bronze/Silver/Gold) with Delta Lake best practices.
Ensure integration with enterprise security standardsKey Vaults, ADLS/S3, IAM, networking.
4. Data Governance & MDM
Implement governance frameworks around data quality, lineage, cataloging, and stewardship.
Define master data validations, deduplication logic, survivorship rules, and versioning.
Implement data quality rules using Delta Live Tables (DLT), expectations, and audits.
Collaborate with business teams to define golden records and standardized master data models.
5. Best Practices, Standards & Reviews
Create coding standards for PySpark, SQL, Delta Lake, and ETL/ELT pipelines.
Review developer code with focus on:
o Query optimization
o Efficient Delta Lake operations (MERGE, OPTIMIZE, ZORDER)
o Cluster cost optimization
o Error handling and logging patterns
Define reusable frameworks for ingestion, transformation, and reconciliation.
6. Development Guidance & Team Enablement
Mentor developers on Databricks architecture, PySpark patterns, and SAP data structures.
Provide technical leadership in design sessions and sprint planning.
Conduct knowledge sessions on best practices and common pitfalls.
Troubleshoot complex data pipeline issues across SAP Databricks downstream systems.
Job Requisition: Data Architect
Location: California Bay Area, CA (Available to go to office as required)
Role expectations
The Data Architect will be responsible for designing, implementing, and maintaining scalable data architectures on the Databricks platform with a strong understanding of SAP data structures, especially master data. The role requires hands-on experience in data engineering, governance, and platform administration, as well as the ability to guide development teams through best practices, architecture decisions, and code reviews.
Skills
Technical Skills
815+ years in data engineering/architecture, with 35+ years specifically in Databricks.
Deep knowledge of:
o PySpark, Spark SQL, Delta Lake
o Unity Catalog, cluster management, Lakehouse governance
o Azure/AWS/Google Cloud Platform cloud architecture
Strong experience with SAP data:
o Extracting data from ECC/S4/BW
o Understanding of SAP tables, master data structures, and business logic
o Experience with IDOCs, BAPIs, ODP/ODQ sources
Strong MDM experience:
o Master data modelling
o Data quality frameworks
o Metadata management
o Golden record management
CI/CD: Git, Azure DevOps, GitHub Actions or similar.
Databricks Workflows / Jobs orchestration.
Exposure to planning systems such as SAP IBP/APO (preferred but not required).
Soft Skills
Strong communication and documentation skills.
Ability to interact with business and technical teams.
Problem-solving with a focus on performance, reliability, and scalability.
Leadership mindset with ability to guide and upskill teams.
Detailed skills
Architecture & Solution Design
Design end-to-end data architectures leveraging Databricks Lakehouse Platform (Delta Lake, Unity Catalog, Lakehouse Governance).
Develop scalable ingestion, transformation, and consumption patterns for SAP data (ECC/S4, BW, IBP, APO, etc.).
Define data models for Master Data Management (MDM) Material, Customer, Vendor, BOM, Plant, Cost Center, Profit Center, etc.
Create logical/physical models aligned with business processes (planning, procurement, manufacturing, finance).
2. Databricks Platform Administration
Manage workspace configuration, clusters, secrets, networking, and access control.
Set up and maintain Unity Catalog, catalogs, schemas, storage credentialing, and data lineage.
Develop CI/CD frameworks for Databricks repos, workflows, and environment promotions.
Monitor platform performance, optimize cluster sizing, and implement cost-control measures.
3. Infrastructure & Environment Setup
Design and configure environments (Dev/Test/Prod) across Azure/AWS/Google Cloud Platform Databricks.
Set up pipelines for SAP data ingestion using ADF, Synapse, Data Factory, AWS Glue, SAP connectors, ODP/ODQ, RFC/IDOC/BAPI mechanisms.
Architect secure storage layers (Bronze/Silver/Gold) with Delta Lake best practices.
Ensure integration with enterprise security standardsKey Vaults, ADLS/S3, IAM, networking.
4. Data Governance & MDM
Implement governance frameworks around data quality, lineage, cataloging, and stewardship.
Define master data validations, deduplication logic, survivorship rules, and versioning.
Implement data quality rules using Delta Live Tables (DLT), expectations, and audits.
Collaborate with business teams to define golden records and standardized master data models.
5. Best Practices, Standards & Reviews
Create coding standards for PySpark, SQL, Delta Lake, and ETL/ELT pipelines.
Review developer code with focus on:
o Query optimization
o Efficient Delta Lake operations (MERGE, OPTIMIZE, ZORDER)
o Cluster cost optimization
o Error handling and logging patterns
Define reusable frameworks for ingestion, transformation, and reconciliation.
6. Development Guidance & Team Enablement
Mentor developers on Databricks architecture, PySpark patterns, and SAP data structures.
Provide technical leadership in design sessions and sprint planning.
Conduct knowledge sessions on best practices and common pitfalls.
Troubleshoot complex data pipeline issues across SAP Databricks downstream systems.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.