Role: Data Engineer - Apache
Location: Data Engineer
Type: 100% on site in Corning, NY
Duration: Long Term
Education and Experience:
· This position focuses on Data pipelines & workflows
· Bachelor’s degree in computer science, information systems, data engineering, or related field, or equivalent practical experience. May consider an Associates if the candidate has an additional 3-5 years experience than what is being required.
· 2+ years of professional experience in data engineering, ETL development, or related work, or equivalent hands-on experience
· Experience or interest in scientific software, materials science, research environments, or technically complex domains is a plus
Scope of the position
· Embed within a cross-functional Agile team, participating in sprint planning, stand-ups, backlog refinement, and technical discussions.
· Design, build, troubleshoot, and maintain ETL/ELT workflows that support application functionality, analytics, reporting, and scientific workflows.
· Develop and manage data pipelines using Apache Airflow, ensuring reliable orchestration, scheduling, monitoring, and recovery of data processes.
· Work with stakeholders including software developers, scientists, and engineers to understand data sources, workflow requirements, and downstream data needs.
· Extract, transform, validate, and load data across systems, including relational databases such as Postgres SQL and Oracle.
· Write, optimize, and maintain complex SQL queries, scripts, and transformation logic to support operational and analytical use cases.
· Troubleshoot data quality issues, ETL failures, pipeline bottlenecks, and schema inconsistencies; identify root causes and implement durable solutions.
· Support database exploration, data validation, and troubleshooting using tools such as DBeaver and related database utilities.
· Evaluate and help adopt new data tools and technologies, including lightweight analytics and transformation solutions (e.g. DuckDB) where appropriate.
· Collaborate with engineering teams to support reliable integration between data pipelines, applications, APIs, and downstream consumers.
· Assist with schema evolution, data modeling, migration planning, and data consistency across systems.
· Document pipeline logic, data dependencies, transformation rules, and operational procedures to support maintainability and team knowledge sharing.
· Help improve data engineering standards, observability, testing practices, and operational reliability across the team.
· Regularly interacting with scientists and engineers to understand research and technical workflows; experience in scientific or research environments is a strong plus.
Technical Skills – 2+ years (or commensurate experience):
· Experience designing, building, and troubleshooting ETL/ELT pipelines
· Hands-on experience with workflow orchestration tools, preferably Apache Airflow
· Strong experience writing and optimizing SQL
· Experience working with relational databases, especially Postgres SQL and Oracle
· Ability to develop and maintain data transformations, validation steps, and pipeline logic across multiple systems
· Experience with database tools such as DBeaver or similar for query development, exploration, and troubleshooting
· Familiarity with modern data processing and analytical tools such as DuckDB or interest in evaluating emerging data technologies
· Understanding of data modeling, schema design, data integrity, and performance tuning
· Experience troubleshooting pipeline failures, performance issues, and inconsistent or incomplete datasets
· Familiarity with scripting or programming for pipeline development and automation; Python experience is strongly preferred
· Understanding of version control and collaborative development workflows
· Experience supporting production data systems with an emphasis on reliability, maintainability, and clear documentation