Overview
Skills
Job Details
Key Responsibilities
Build and Maintain Data Pipelines: Develop scalable data pipelines using PySpark and Spark within the Databricks environment.
Implement Medallion Architecture: Design workflows using raw, trusted, and refined layers to drive reliable data processing.
Integrate Diverse Data Sources: Connect data from Kafka streams, extract channels, and APIs.
Data Cataloging & Governance: Model and register datasets in enterprise data catalogs, ensuring robust governance and accessibility.
Access Control: Manage secure, role-based access patterns to support analytics, AI, and ML needs.
Team Collaboration: Work closely with peers to achieve required code coverage and deliver high-quality, well-tested solutions. Required Skills & Experience
Databricks: Expert-level proficiency
PySpark/Spark: Advanced hands-on experience
AWS: Strong competency, including S3 and Terraform for infrastructure-as-code
Data Architecture: Solid knowledge of the medallion pattern and data warehousing best practices
Data Pipelines: Proven ability to build, optimize, and govern enterprise data pipelines