Position: Data/Scala/Spark Engineering Specialist
Location: NYC, NY
Duration; 12 Months
Face to Face interview required
We''re migrating complex on-prem regulatory reporting pipelines from a legacy ETL + Autosys + SQL + Teradata stack to a modern Databricks + Snowflake platform on Azure. The role is hands-on: design, implement, test, and reconcile production pipelines feeding regulatory reports under strict parity requirements.
Must-have
=
Scala / Spark — production experience writing Spark applications in Scala (not just notebooks); comfortable with the DataFrame API, joins, window functions, partitioning, and performance tuning
Databricks — Serverless compute, Unity Catalog, Asset Bundles, Databricks CLI
SQL fluency — confortable writing, analyzing and extracting requirements from complex SQL scripts
Snowflake — schema design, performance, Spark-Snowflake connector
Azure — ADLS, networking basics, secrets/identity (Entra ID / managed identities)
Orchestration — Airflow (DAG authoring, sensors, retries, SLAs)
CI/CD — Artifactory, GitHub Actions pipelines: build, sharded test matrices, artifact promotion through dev → QA → UAT → prod
Testing — Experience in TDD, writing unit tests (ScalaTest, AnyFlatSpec) and BDD (Concordion or equivalent)
Data quality & reconciliation — building automated parity checks against legacy outputs, drift detection, row-level reconciliation tooling
Large-scale migrations — proven track record migrating legacy ETL (Autosys/Informatica/etc.) to cloud data platforms, including dependency mapping and cutover planning
Modern data engineering practices — medallion architecture (Bronze/Silver/Gold), idempotent pipelines, schema evolution, lineage, observability
Nice-to-have
Financial services / regulatory reporting domain
Python (Databricks utilities, tooling)
Spec-driven development workflows (specs → plans → tasks → implementation)
Gradle (composite builds) and JVM tooling.