Data Engineer

Overview

On Site
Full Time
Part Time
Accepts corp to corp applications
Contract - Independent
Contract - W2

Skills

Data Flow
Real-time
Decision-making
Data Integration
Big Data
Google Cloud Platform
Google Cloud
Unstructured Data
Analytics
Reporting
Management
Optimization
Data Governance
Access Control
Agile
Continuous Improvement
Python
Scala
Data Processing
SQL
Database Design
MySQL
PostgreSQL
Microsoft SQL Server
Apache Spark
Apache Kafka
Apache Hadoop
Cloud Computing
Amazon Web Services
Amazon S3
Amazon Redshift
Electronic Health Record (EHR)
Microsoft Azure
ADF
Databricks
Extract
Transform
Load
ELT
Data Modeling
Data Warehouse
NoSQL
Database
MongoDB
Apache Cassandra
Amazon DynamoDB
Docker
Kubernetes
Continuous Integration
Continuous Delivery
Apache NiFi
Workflow
Orchestration
Machine Learning (ML)
Data Quality
Analytical Skill
Problem Solving
Conflict Resolution
SANS
Communication
Collaboration

Job Details

Job Title: Data Engineer

Locations: Dallas, TX / Plano, TX / Austin, TX / Houston, TX / Richardson, TX
Experience Required: 6-10 Years



About the Role

We re seeking a skilled and passionate Data Engineer to design, build, and optimize data pipelines and architectures that enable efficient data processing and analytics.

You ll work closely with data scientists, analysts, and software engineers to ensure data flows are reliable, scalable, and secure supporting real-time insights and enterprise-grade decision-making.

This role is ideal for someone with hands-on experience in data integration, ETL pipelines, big data platforms, and cloud data solutions (AWS, Azure, or Google Cloud Platform).



Key Responsibilities

  • Design, develop, and maintain ETL/ELT pipelines to process large volumes of structured and unstructured data.
  • Build and optimize data models and data warehouses/lakes for analytics and reporting.
  • Integrate data from multiple sources using tools such as Apache Spark, Kafka, or Airflow.
  • Implement and manage data pipelines on cloud platforms like AWS (Glue, Redshift, S3) or Azure (Data Factory, Synapse).
  • Ensure data quality, integrity, and performance through validation, monitoring, and optimization.
  • Collaborate with stakeholders to define data requirements and deliver robust data solutions.
  • Apply data governance, access control, and security best practices.
  • Troubleshoot performance issues and optimize query execution across large datasets.
  • Work in an Agile environment and contribute to continuous improvement and automation initiatives.



Primary Skills (Must Have)

  • Strong experience in Python or Scala for data processing.
  • Proficiency with SQL and database design (MySQL, PostgreSQL, SQL Server).
  • Hands-on experience with Apache Spark, Kafka, or Hadoop ecosystems.
  • Cloud experience with AWS (Glue, S3, Redshift, EMR) or Azure (ADF, Synapse, Databricks).
  • Expertise in ETL/ELT design, data modeling, and pipeline orchestration.
  • Understanding of data warehousing, data lakes, and distributed data systems.



Secondary Skills (Good to Have)

  • Familiarity with NoSQL databases (MongoDB, Cassandra, DynamoDB).
  • Experience with containerization (Docker, Kubernetes) and CI/CD pipelines.
  • Exposure to Airflow, NiFi, or dbt for workflow orchestration.
  • Knowledge of machine learning data pipelines or data quality frameworks.
  • Excellent analytical and problem-solving skills.
  • Strong communication and collaboration abilities.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Purple Drive Technologies LLC