Position: Data/Scala/Spark Engineering Specialist
Location: NYC, NY
Duration; 12 Months
We're migrating complex on-prem regulatory reporting pipelines from a legacy ETL + Autosys + SQL + Teradata stack to a modern Databricks + Snowflake platform on Azure. The role is hands-on: design, implement, test, and reconcile production pipelines feeding regulatory reports under strict parity requirements.
Must-have
=
Scala / Spark production experience writing Spark applications in Scala (not just notebooks); comfortable with the DataFrame API, joins, window functions, partitioning, and performance tuning
Databricks Serverless compute, Unity Catalog, Asset Bundles, Databricks CLI
SQL fluency confortable writing, analyzing and extracting requirements from complex SQL scripts
Snowflake schema design, performance, Spark-Snowflake connector
Azure ADLS, networking basics, secrets/identity (Entra ID / managed identities)
Orchestration Airflow (DAG authoring, sensors, retries, SLAs)
CI/CD Artifactory, GitHub Actions pipelines: build, sharded test matrices, artifact promotion through dev QA UAT prod
Testing Experience in TDD, writing unit tests (ScalaTest, AnyFlatSpec) and BDD (Concordion or equivalent)
Data quality & reconciliation building automated parity checks against legacy outputs, drift detection, row-level reconciliation tooling
Large-scale migrations proven track record migrating legacy ETL (Autosys/Informatica/etc.) to cloud data platforms, including dependency mapping and cutover planning
Modern data engineering practices medallion architecture (Bronze/Silver/Gold), idempotent pipelines, schema evolution, lineage, observability
Nice-to-have
Financial services / regulatory reporting domain
Python (Databricks utilities, tooling)
Spec-driven development workflows (specs plans tasks implementation)
Gradle (composite builds) and JVM tooling