Overview
Skills
Job Details
This role supports data engineering, machine learning, and analytics initiatives within this organization that relies on large-scale data processing.
Duties include:
- Designing and developing scalable data pipelines
- Implementing ETL/ELT workflows
- Optimizing Spark jobs
- Integrating with Azure Data Factory
- Automating deployments
- Collaborating with cross-functional teams
- Ensuring data quality, governance, and security.
CANDIDATE SKILLS AND QUALIFICATIONS
Minimum Requirements: | ||
Years | Required/Preferred | Experience |
4 | Required | Implement ETL/ELT workflows for both structured and unstructured data |
4 | Required | Automate deployments using CI/CD tools |
4 | Required | Collaborate with cross-functional teams including data scientists, analysts, and stakeholders |
4 | Required | Design and maintain data models, schemas, and database structures to support analytical and operational use cases |
4 | Required | Evaluate and implement appropriate data storage solutions, including data lakes (Azure Data Lake Storage) and data warehouses |
4 | Required | Implement data validation and quality checks to ensure accuracy and consistency |
4 | Required | Contribute to data governance initiatives, including metadata management, data lineage, and data cataloging |
4 | Required | Implement data security measures, including encryption, access controls, and auditing; ensure compliance with regulations and best practices |
4 | Required | Proficiency in Python and R programming languages |
4 | Required | Strong SQL querying and data manipulation skills |
4 | Required | Experience with Azure cloud platform |
4 | Required | Experience with DevOps, CI/CD pipelines, and version control systems |
4 | Required | Working in agile, multicultural environments |
4 | Required | Strong troubleshooting and debugging capabilities |
3 | Required | Design and develop scalable data pipelines using Apache Spark on Databricks |
3 | Required | Optimize Spark jobs for performance and cost-efficiency |
3 | Required | Integrate Databricks solutions with cloud services (Azure Data Factory) |
3 | Required | Ensure data quality, governance, and security using Unity Catalog or Delta Lake |
3 | Required | Deep understanding of Apache Spark architecture, RDDs, DataFrames, and Spark SQL |
3 | Required | Hands-on experience with Databricks notebooks, clusters, jobs, and Delta Lake |
1 | Preferred | Knowledge of ML libraries (MLflow, Scikit-learn, TensorFlow) |
1 | Preferred | Databricks Certified Associate Developer for Apache Spark |
1 | Preferred | Azure Data Engineer Associate |