Overview
Skills
Job Details
Hello,
I hope this email finds you well. My name is Piyush Verma, and I am a Lead Technical Recruiter at Empower Professionals Inc. We have a Databrick Architect Role with our client which is Remote. If you have a matching candidate please send over a resume, after which I will give you a call to discuss further.
Role: Databrick Architect
Location: Remote
Duration: 12+ Months
Must Have Skills
- Databricks +AWS
- Data Modeling & Design
- PySpark Scripts
- SQL Knowledge
- Data Integration
- Unity Catalog and Security Design
- Identity federation
- Auditing and Observability system tables/API/external tools
- Access control / Governance in UC
- External locations & storage credentials
- Personal tokens & service principals
- Metastore & unity catalog concepts
- Interactive vs production workflows
- Policies & entitlements
- Compute types (incl. UC & non-UC
Job description-
Note:- Candidate should have Hands on experience in Databricks +AWS, Data Modeling & Design, PySpark Scripts, SQL Knowledge, Unity Catalog and Security Design, Identity federation, Auditing and Observability system tables/API/external tools, Access control / Governance in UC, External locations & storage credentials, Personal tokens & service principals, Metastore & unity catalog concepts, Interactive vs production workflows, Policies & entitlements, Compute types (incl. UC & non UC, scaling, optimization)
Key Responsibilities:
- Data Strategy & Architecture Development
- Define and implement scalable, cost-effective, and high-performance data architecture aligned with business objectives.
- Design Lakehouse solutions using Databricks on AWS, Azure, or Google Cloud Platform.
- Establish best practices for Delta Lake and Lakehouse Architecture.
- Data Engineering & Integration
- Architect ETL/ELT pipelines using Databricks Spark, Delta Live Tables (DLT), and Databricks Workflows.
- Integrate data from sources like Oracle Fusion Middleware, Web Methods, MuleSoft, Informatica.
- Enable real-time and batch processing using Apache Spark and Delta Lake.
- Ensure seamless connectivity with enterprise platforms (Salesforce, SAP, ERP, CRM).
- Data Governance, Security & Compliance
- Implement governance frameworks using Unity Catalog for lineage, metadata, and access control.
- Ensure HIPAA, GDPR, and life sciences regulatory compliance.
- Define and manage RBAC, Databricks SQL security, and access policies.
- Enable self-service data stewardship and democratization.
- Performance Optimization & Cost Management
- Optimize Databricks compute clusters (DBU usage) for cost efficiency.
- Leverage Photon Engine, Adaptive Query Execution (AQE), and caching for performance tuning.
- Monitor workspace health, job efficiency, and cost analytics.
- AI/ML Enablement & Advanced Analytics
- Design and manage ML pipelines using Databricks MLflow.
- Support AI-driven analytics in genomics, drug discovery, and clinical data.
- Collaborate with data scientists to deploy and operationalize ML models.
- Collaboration & Stakeholder Engagement
- Align data strategy with business objectives across teams.
- Engage with platform vendors (Databricks, AWS, Azure, Google Cloud Platform, Informatica, Oracle, MuleSoft).
- Lead PoCs, drive Databricks adoption, and provide technical leadership.
- Data Democratization & Self-Service Enablement
- Implement self-service analytics using Databricks SQL and BI tools (Power BI, Tableau).
- Foster data literacy and enable data sharing frameworks.
- Establish robust data cataloging and lineage.
- Migration & Modernization
- Lead migration from legacy platforms (Informatica, Oracle, Hadoop) to Databricks Lakehouse.
- Design cloud modernization roadmaps ensuring minimal disruption.
Key Skills:
Databricks & Spark:
- Databricks Lakehouse, Delta Lake, Unity Catalog, Photon Engine.
- Apache Spark (PySpark, Scala, SQL), Databricks SQL, Delta Live Tables, Databricks Workflows.
Cloud Platforms:
- Databricks on AWS (preferred), Azure, or Google Cloud Platform.
- Cloud storage (S3, ADLS, GCS), VPC, IAM, Private Link.
- Infrastructure as Code: Terraform, ARM, CloudFormation.
Data Modeling & Architecture:
- Dimensional, Star Schema, Snowflake, Data Vault.
- Experience with Lakehouse, Data Mesh, and Data Fabric architectures.
- Data partitioning, indexing, caching, query optimization.
ETL/ELT & Integration:
- ETL/ELT development with Databricks, Informatica, MuleSoft, Apache tools.
Thanks
Piyush Verma Lead Technical Recruiter | Empower Professionals
|Official Phone: x 350
-------------------------------------------------------------------------------------------------------------
Fax: | 100 Franklin Square Drive Suite 104 | Somerset, NJ 08873
Certified NJ and NY Minority Business Enterprise (NMSDC)