Title : Senior / Principal Data Engineer
Location : New York
Duration : 12+ months
Relevant Experience (in Yrs.): 10+
Data Ingestion & Modeling
- Ingest and model data from diverse sources including APIs, files/SFTP, streaming platforms, and relational databases.
- Implement layered data architectures (raw / clean / serving) using PySpark, SQL, dbt, and Python.
Pipeline Design & Orchestration
- Design, build, and operate production data pipelines using Prefect or Airflow, including scheduling, retries, parameterization, SLAs, and alerting.
- Maintain clear, well-documented runbooks and operational procedures.
Cloud Data Platforms
- Build on cloud data platforms using object storage (S3, ADLS, GCS) and Spark-based compute (Databricks or equivalent).
- Manage jobs, secrets, access controls, and environment configurations across dev/test/prod.
Data Services & APIs
- Publish governed data services and manage their lifecycle using Azure API Management (APIM).
- Implement authentication and authorization, policies, versioning, quotas, and monitoring for data APIs.
Data Quality, Governance & Observability
- Enforce data quality and governance through data contracts, automated tests, validations, lineage, and observability tooling.
- Implement proactive monitoring and alerting to ensure data reliability and trust.
Performance, Cost & Reliability
- Optimize performance and cost through partitioning, clustering, query tuning, job sizing, and workload management.
- Continuously improve platform reliability and reduce operational risk.
Security & Compliance
- Uphold security and compliance standards, including PII handling, encryption, masking, and access controls.
Collaboration & Enablement
- Collaborate with analytics, AI engineering, and business stakeholders to translate requirements into scalable, production-ready datasets.
- Enable AI and LLM use cases by packaging datasets and metadata for downstream consumption, integrating with Model Context Protocol (MCP) where appropriate.
Platform Improvement
- Improve developer productivity by automating routine tasks, reducing technical debt, and maintaining clear, up-to-date documentation.
Must Skills:
· Python, SQL, and Spark (PySpark)
· Databricks
· Prefect or Airflow, dbt
· Snowflake
· Azure, AWS, or Google Cloud Platform