Technical Architect Data Platforms & Cloud Pipelines Hands-on Technical Architect specializing in OT data (SCADA, historians, IoT, Maximo), delivering scalable streaming, batch ingestion. Deep expertise in PySpark performance tuning, time-series modeling, and OT semantics (tags, events, assets, work orders).
Delivery Engineering for - Standards: repos structure, CI/CD for notebooks/jobs/DLT Data contracts, unit tests for transforms, backfill/runbooks/rollback. Reference implementations: golden pipelines (templates), KPI marts, and example notebooks.
Define bronze/silver/gold zones on Delta Lake with open table formats and versioned schemas. Publish semantic/consumption layers for Power BI/Fabric
Build incremental pipelines with streaming MERGE, watermarking, exactly-once upserts, and dedup (tag, ts, src_seq). Implement Auto Loader for historian/IoT drop zones; DLT expectations for DQ; Workflows for orchestration & SLAs. Implement observability: DQ metrics, lineage, run health, cost dashboards (jobs, clusters, SQL). Gold KPIs: availability, reliability, MTBF/MTTR, downtime attribution, with reproducible SQL specs.
Row/column-level security, attribute-based policies; data classification, masking for sensitive/plant data. Managed access via UC volumes, external locations, credential passthrough; audit lineage and permissions. Data lifecycle: retention tiers (hot/warm/cold), VACUUM policies, OPTIMIZE and ZORDER maintenance. Optimize joins via broadcast hints, bucketing, checkpoint pruning, and state store tuning.