Overview
On Site
Full Time
Skills
Extract
Transform
Load
ELT
Real-time
Batch Processing
Data Integration
Data Quality
Data Validation
Collaboration
Data Governance
Regulatory Compliance
Technical Writing
Data Architecture
Data Warehouse
PySpark
Optimization
ADF
Data Lake
Storage
Python
Pandas
NumPy
SQL
Unix
Linux
Shell Scripting
Java
Scala
Apache Kafka
Analytics
Git
Continuous Integration
Continuous Delivery
DevOps
GitHub
Build Tools
Apache Maven
Gradle
Orchestration
Apache Airflow
Workflow
TWS
Problem Solving
Conflict Resolution
Analytical Skill
Debugging
Agile
Scrum
Machine Learning (ML)
scikit-learn
TensorFlow
PyTorch
Vector Databases
Apache Spark
Streaming
Prompt Engineering
Semantics
Docker
Kubernetes
Grafana
Databricks
Artificial Intelligence
Microsoft Azure
Computer Science
Job Details
SOFTWARE DATA ENGINEER
About the Role:
We are seeking a Software Data Engineer. The ideal candidate will be responsible for designing and maintaining modern, scalable data solutions on Azure using Databricks. This includes building data pipelines, ETL/ELT workflows, and architectures such as Data Lakes, Warehouses, and Lakehouses for both real-time and batch processing. The role involves integrating large datasets from diverse sources, implementing Delta Lake, and preparing data for machine learning through feature stores.
Key Responsibilities:
Design, develop, and optimize scalable data pipelines and ETL/ELT workflows using Databricks on Azure
Build and maintain modern data architectures (Data Lake, Data Warehouse, Lakehouse) for real-time streaming and batch processing on Azure
Implement data integration solutions for large-scale datasets across diverse data sources using Delta Lake and other data formats
Create feature stores and data preparation workflows for machine learning applications on Azure
Develop and maintain data quality frameworks and implement data validation checks
Collaborate with data scientists, ML engineers, analysts, and business stakeholders to deliver high-quality, production-ready data solutions Monitor, troubleshoot, and optimize data workflows for performance, cost efficiency, and reliability
Implement data governance, security, and compliance standards across all data processes
Create and maintain comprehensive technical documentation for data pipelines and architectures
Required Qualifications:
Data Architecture: Deep understanding of Data Lake, Data Warehouse, and Lakehouse concepts with hands-on implementation experience
Databricks & Spark: 3+ years of hands-on experience with Databricks on Azure, Apache Spark (PySpark/Spark SQL), Delta Lake optimization
Azure Platform: 3+ years working with Azure Data Factory (ADF), Azure Data Lake Storage (ADLS), Azure Synapse Analytics, Azure ML Studio, Azure Databricks
Programming: Strong proficiency in Python (including pandas, NumPy), SQL, and Unix/Linux shell scripting; experience with Java or Scala is a plus
Streaming: 3+ years' experience with Apache Kafka or Azure Event Hubs, Azure Stream Analytics
DevOps: Hands-on experience with Git, CI/CD pipelines (Azure DevOps, GitHub Actions), and build tools (Maven, Gradle)
Orchestration: Working knowledge of workflow schedulers (Apache Airflow, Azure Data Factory, Databricks Workflows, TWS)
Problem-solving: Strong analytical and debugging skills with ability to work in agile/scrum environments
Preferred Qualifications:
Experience with ML frameworks and libraries (scikit-learn, TensorFlow, PyTorch) for data preparation and feature engineering on Azure
Experience with vector databases (Azure AI Search, Pinecone, Weaviate, Milvus) and RAG (Retrieval Augmented Generation) architectures
Experience with modern data transformation tools (DBT, Spark Structured Streaming on Databricks)
Understanding of LLM applications, prompt engineering, and AI agent frameworks (Azure OpenAI Service, Semantic Kernel)
Familiarity with containerization (Docker, Azure Kubernetes Service)
Experience with monitoring and observability tools (Azure Monitor, Application Insights, Datadog, Grafana)
Certifications in Databricks, Azure Data Engineer Associate, Azure AI Engineer, or Azure Solutions Architect
Educational Background:
Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
Additional Information:
Expected start date: Dec-1st
Duration of engagement: 3 months, with potential extension.
Work location: Alpharetta, GA or Plano TX.
Work location model: Hybrid, 3-days in office.
About the Role:
We are seeking a Software Data Engineer. The ideal candidate will be responsible for designing and maintaining modern, scalable data solutions on Azure using Databricks. This includes building data pipelines, ETL/ELT workflows, and architectures such as Data Lakes, Warehouses, and Lakehouses for both real-time and batch processing. The role involves integrating large datasets from diverse sources, implementing Delta Lake, and preparing data for machine learning through feature stores.
Key Responsibilities:
Design, develop, and optimize scalable data pipelines and ETL/ELT workflows using Databricks on Azure
Build and maintain modern data architectures (Data Lake, Data Warehouse, Lakehouse) for real-time streaming and batch processing on Azure
Implement data integration solutions for large-scale datasets across diverse data sources using Delta Lake and other data formats
Create feature stores and data preparation workflows for machine learning applications on Azure
Develop and maintain data quality frameworks and implement data validation checks
Collaborate with data scientists, ML engineers, analysts, and business stakeholders to deliver high-quality, production-ready data solutions Monitor, troubleshoot, and optimize data workflows for performance, cost efficiency, and reliability
Implement data governance, security, and compliance standards across all data processes
Create and maintain comprehensive technical documentation for data pipelines and architectures
Required Qualifications:
Data Architecture: Deep understanding of Data Lake, Data Warehouse, and Lakehouse concepts with hands-on implementation experience
Databricks & Spark: 3+ years of hands-on experience with Databricks on Azure, Apache Spark (PySpark/Spark SQL), Delta Lake optimization
Azure Platform: 3+ years working with Azure Data Factory (ADF), Azure Data Lake Storage (ADLS), Azure Synapse Analytics, Azure ML Studio, Azure Databricks
Programming: Strong proficiency in Python (including pandas, NumPy), SQL, and Unix/Linux shell scripting; experience with Java or Scala is a plus
Streaming: 3+ years' experience with Apache Kafka or Azure Event Hubs, Azure Stream Analytics
DevOps: Hands-on experience with Git, CI/CD pipelines (Azure DevOps, GitHub Actions), and build tools (Maven, Gradle)
Orchestration: Working knowledge of workflow schedulers (Apache Airflow, Azure Data Factory, Databricks Workflows, TWS)
Problem-solving: Strong analytical and debugging skills with ability to work in agile/scrum environments
Preferred Qualifications:
Experience with ML frameworks and libraries (scikit-learn, TensorFlow, PyTorch) for data preparation and feature engineering on Azure
Experience with vector databases (Azure AI Search, Pinecone, Weaviate, Milvus) and RAG (Retrieval Augmented Generation) architectures
Experience with modern data transformation tools (DBT, Spark Structured Streaming on Databricks)
Understanding of LLM applications, prompt engineering, and AI agent frameworks (Azure OpenAI Service, Semantic Kernel)
Familiarity with containerization (Docker, Azure Kubernetes Service)
Experience with monitoring and observability tools (Azure Monitor, Application Insights, Datadog, Grafana)
Certifications in Databricks, Azure Data Engineer Associate, Azure AI Engineer, or Azure Solutions Architect
Educational Background:
Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
Additional Information:
Expected start date: Dec-1st
Duration of engagement: 3 months, with potential extension.
Work location: Alpharetta, GA or Plano TX.
Work location model: Hybrid, 3-days in office.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.