Note: We are looking for a Senior Data Engineer with strong expertise in Databricks, Apache Airflow, Snowflake, Python, SQL, Airflow DAGs, ETL/ELT frameworks, CI/CD, and cloud platforms (AWS/Azure/Google Cloud Platform), and need more than 13+ years of overall IT experience.
Job Description:
Role Summary
We are seeking an experienced Data Engineer to design, build, and optimize scalable, high performance data pipelines using Databricks, Apache Airflow, Snowflake, Python, and SQL.
The role involves end to end ownership of data ingestion, transformation, orchestration, and optimization across cloud based data platforms, enabling analytics, reporting, and downstream data science use cases.
Key Responsibilities
Data Engineering & Pipeline Development
- Design, develop, and maintain batch and streaming data pipelines using Databricks (PySpark) and Snowflake.
- Build ETL / ELT frameworks to ingest data from multiple sources (RDBMS, APIs, flat files, cloud storage).
- Implement data transformation logic using Python and SQL for scalable and high volume datasets.
- Develop metadata driven and reusable pipelines following enterprise data engineering best practices.
Workflow Orchestration
- Create and manage complex workflows using Apache Airflow.
- Implement scheduling, dependency management, retries, alerts, and failure handling.
- Integrate Airflow with Databricks jobs, Snowflake tasks, and cloud services.
Databricks & Lakehouse Architecture
- Work on Databricks Lakehouse architecture including Bronze / Silver / Gold (Medallion) layers.
- Optimize Spark jobs using partitioning, caching, broadcast joins, and performance tuning.
- Manage Databricks jobs, clusters, notebooks, and workspace configurations.
Snowflake Development
- Design and optimize Snowflake schemas, tables, views, and warehouses.
- Implement Snowflake SQL transformations, performance tuning, and cost optimization.
- Work with Snowflake features such as Time Travel, Cloning, Tasks, Streams (where applicable).
Data Quality, Governance & Security
- Implement data quality checks, validation frameworks, and reconciliation logic.
- Ensure adherence to data governance, security, and compliance requirements.
- Collaborate with governance teams on metadata, lineage, and access controls.
CI/CD & Operations
- Implement CI/CD pipelines for data code using Git based version control systems.
- Support production deployments, monitoring, and incident resolution.
- Work closely with DevOps, Architecture, and Analytics teams.