Overview
Skills
Job Details
Key Responsibilities:
Design and implement scalable, secure Lakehouse architectures on Databricks for both batch and streaming data pipelines, supporting structured, semi-structured, and unstructured data.
Build end-to-end solutions using Databricks-native tools such as Delta Lake, Delta Live Tables, Unity Catalog, Databricks Workflows, Lakehouse Federation, and Photon to optimise performance, reliability, and cost.
Guide customers on data governance, lineage, observability, and access control, implementing best practices with Unity Catalog, audit logging, and data quality checks.
Lead modernisation and migration efforts from legacy systems (e.g., Hadoop, traditional data warehouses) to the Databricks Lakehouse on AWS, Azure, or Google Cloud Platform.
Implement CI/CD pipelines, dbt workflows, and Git-based version control to support agile data engineering practices and production-ready deployments.
Conduct technical workshops, architecture reviews, and enablement sessions, while collaborating with cross-functional teams (Sales Engineering, Product, Customer Success) to align solutions with business goals.
Create reference architectures, accelerators, and technical documentation to support repeatable deployments and influence the evolution of Databricks' products based on field feedback.
Required Skills and Experience:
10+ years of experience in data engineering, big data platforms, or cloud data architecture, with deep expertise in Apache Spark, Delta Lake, and the Databricks Lakehouse Platform.
Proficient in SQL and Python, with working knowledge of Scala; experienced in data modelling, pipeline design, and warehouse architecture.
Hands-on experience with cloud platforms (AWS, Azure, or Google Cloud Platform), including services like EMR, S3, Glue, ADF, Synapse, Dataflow, and BigQuery.
Skilled in real-time data processing using Structured Streaming, Kafka, Delta Live Tables, or similar technologies (e.g., Kinesis, Event Hubs).
Strong understanding of data security, lineage, and governance, including ABAC, Unity Catalog, and audit/logging frameworks.
Solid grasp of DevOps for data: Git-based workflows, CI/CD, Terraform, secrets management.
Preferred Skills:
Familiar with Delta Sharing, Lakehouse Federation, and Unity Catalog for secure data sharing and governance in distributed architectures.
Hands-on experience with dbt on Databricks and a solid understanding of data mesh principles and enterprise governance frameworks.
Certifications preferred: Databricks Certified Data Engineer Professional or Architect; cloud certifications on AWS, Azure, or Google Cloud Platform are a plus.
Tech Stack You ll Work With:
Databricks Lakehouse Platform, including Delta Lake, Delta Live Tables, Unity Catalog, Databricks SQL, and Workflows
Apache Spark, SQL, Python, and optionally Scala for scalable data engineering
Modern orchestration & transformation tools: dbt, Airflow, CI/CD pipelines (GitHub Actions, Azure DevOps)
Cloud platforms: AWS, Azure, Google Cloud Platform, with services like S3, ADLS, GCS, and infrastructure-as-code via Terraform
Streaming & messaging systems: Kafka, Kinesis, Event Hubs, Pub/Sub.
Taras Technology, LLC is an EEO/AA Employer: women, minorities, the disabled and veterans are encouraged to apply