Lead Data Engineer

Overview

Remote

Depends on Experience

Contract - W2

Skills

SQL

Python

Big Data

Job Details

Job Title: Lead Data Engineer
Location: May work remotely, but would need the capability to report to the office (Huntsville, TX) with advanced notice.

POSITION REQUIREMENTS:
We are seeking a highly skilled and experienced professional to lead the design, implementation, and management of end-to-end enterprise-grade data solutions. This role involves expertise in building and optimizing data warehouses, data lakes, and lakehouse platforms, with a strong emphasis on data engineering, data science, and machine learning. You will work closely with cross-functional teams to create scalable and robust architectures that support advanced analytics and machine learning use cases while adhering to industry standards and best practices.

Education: Bachelor s Computer Science, Data Science, Engineering, or a related field.
*
Experience: Minimum 10 years in data engineering, data architecture, or a similar role, with at least 3 years in a lead capacity.

Responsibilities Include:
* Architect, design, and manage the entire data lifecycle from data ingestion,
* transformation, storage, and processing to advanced analytics and machine learning databases and large-scale processing systems.
* Implement robust data governance frameworks, including metadata management, lineage tracking, security, compliance, and business glossary development.
* Identify, design, and implement internal process improvements, including redesigning infrastructure for greater scalability, optimizing data delivery, and automating manual
* processes.
* Ensure high data quality and reliability through automated data validation and testing and provide high quality clean, and usable data from data sets of varying states of disorder.
* Develop and enforce architecture standards, patterns, and reference models for large-scale data platforms.
* Architect and implement Lambda and Kappa architectures for real-time and batch data processing workflows along with strong data modeling capabilities.
* Ability to identify and implement the most appropriate data management system and enable integration capabilities for external tools to perform ingestion, compilation, analytics and visualization.

REQUIRED SKILLS:
* Proficient in SQL, Python, and big data processing frameworks (e.g., Spark, Flink).
* Strong experience with cloud platforms (AWS, Azure, Google Cloud Platform) and related data services.
* Hands-on experience with data warehousing tools (e.g., Snowflake, Redshift, BigQuery), Databricks running on multiple cloud platforms (AWS, Azure and Google Cloud Platform) and data lake technologies (e.g., S3, ADLS, HDFS).
* Expertise in containerization and orchestration tools like Docker and Kubernetes.
* Knowledge of MLOps frameworks and tools (e.g., MLflow, Kubeflow, Airflow).
* Experience with real-time streaming architectures (e.g., Kafka, Kinesis).
* Familiarity with Lambda and Kappa architectures for data processing.
* Enable integration capabilities for external tools to perform ingestion, compilation, analytics and visualization.

PREFERRED SKILLS:
* Certifications in cloud platforms or data-related technologies.
* Familiarity with graph databases, NoSQL, or time-series databases.
* Knowledge of data privacy regulations (e.g., GDPR, CCPA) and compliance requirements.
* Experience in implementing and managing business glossaries, data governance rules, metadata lineage, and ensuring data quality.
* Highly experienced with AWS cloud platform and Databricks Lakehouse.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

About Digerati Systems Inc

Share