Overview
Skills
Job Details
Job Title: Lead Data Engineer
Location: May work remotely, but would need the capability to report to the office (Huntsville, TX) with advanced notice.
POSITION REQUIREMENTS:
We are seeking a highly skilled and experienced professional to lead the design, implementation, and management of end-to-end enterprise-grade data solutions. This role involves expertise in building and optimizing data warehouses, data lakes, and lakehouse platforms, with a strong emphasis on data engineering, data science, and machine learning. You will work closely with cross-functional teams to create scalable and robust architectures that support advanced analytics and machine learning use cases while adhering to industry standards and best practices.
Education: Bachelor s Computer Science, Data Science, Engineering, or a related field.
*
Experience: Minimum 10 years in data engineering, data architecture, or a similar role, with at least 3 years in a lead capacity.
Responsibilities Include:
* Architect, design, and manage the entire data lifecycle from data ingestion,
* transformation, storage, and processing to advanced analytics and machine learning databases and large-scale processing systems.
* Implement robust data governance frameworks, including metadata management, lineage tracking, security, compliance, and business glossary development.
* Identify, design, and implement internal process improvements, including redesigning infrastructure for greater scalability, optimizing data delivery, and automating manual
* processes.
* Ensure high data quality and reliability through automated data validation and testing and provide high quality clean, and usable data from data sets of varying states of disorder.
* Develop and enforce architecture standards, patterns, and reference models for large-scale data platforms.
* Architect and implement Lambda and Kappa architectures for real-time and batch data processing workflows along with strong data modeling capabilities.
* Ability to identify and implement the most appropriate data management system and enable integration capabilities for external tools to perform ingestion, compilation, analytics and visualization.
REQUIRED SKILLS:
* Proficient in SQL, Python, and big data processing frameworks (e.g., Spark, Flink).
* Strong experience with cloud platforms (AWS, Azure, Google Cloud Platform) and related data services.
* Hands-on experience with data warehousing tools (e.g., Snowflake, Redshift, BigQuery), Databricks running on multiple cloud platforms (AWS, Azure and Google Cloud Platform) and data lake technologies (e.g., S3, ADLS, HDFS).
* Expertise in containerization and orchestration tools like Docker and Kubernetes.
* Knowledge of MLOps frameworks and tools (e.g., MLflow, Kubeflow, Airflow).
* Experience with real-time streaming architectures (e.g., Kafka, Kinesis).
* Familiarity with Lambda and Kappa architectures for data processing.
* Enable integration capabilities for external tools to perform ingestion, compilation, analytics and visualization.
PREFERRED SKILLS:
* Certifications in cloud platforms or data-related technologies.
* Familiarity with graph databases, NoSQL, or time-series databases.
* Knowledge of data privacy regulations (e.g., GDPR, CCPA) and compliance requirements.
* Experience in implementing and managing business glossaries, data governance rules, metadata lineage, and ensuring data quality.
* Highly experienced with AWS cloud platform and Databricks Lakehouse.