Designing Databricksbased lakehouse architectures on AWS (Delta Lake + S3 + Unity Catalog).
Clear separation of compute vs. serving layers in distributed architectures.
Low-latency API strategy where Spark is insufficient (e.g., leveraging optimized services or caching).
Caching strategies to accelerate reads and reduce compute cost.
Data partitioning, file size tuning, and optimization strategies for large-scale pipelines.
Experience handling multi-terabyte structured timeseries workloads.
Ability to distill architectural significance from ambiguous business requirements.
Strong curiosity, questioning, and requirementprobing mindset.
Playercoach approach: hands-on technical depth + ability to guide
Bachelor s or Master s in Computer Science, Data Science, Engineering, Statistics, or related field.
10+ years of experience in data engineering, ML engineering, or AI/ML architecture roles.
Deep expertise in Databricks on AWS, including:
PySpark / Spark SQL
Databricks Notebooks
Delta Lake
Unity Catalog
MLflow
Databricks Jobs & Workflows
Strong programming ability in Python (pandas, numpy, scikit-learn).
Demonstrated experience with large-scale, multi-terabyte data processing.
Strong understanding of ML algorithms, distributed systems, and data optimization.