Position Overview
The Senior Data Engineer is responsible for designing, developing, and maintaining scalable data infrastructure and pipelines that ensure clean, organized, secure, and timely data is available to downstream users. This role supports analytics, reporting, business intelligence, and advanced data initiatives to drive child- and family-centered decision-making for senior leadership and the administration.
The Senior Data Engineer works closely with client's leadership, the Illinois Department of Innovation and Technology (DoIT), data architects, analysts, software engineers, and other stakeholders to deliver reliable, compliant, and high-performing data systems.
Key Responsibilities
Data Architecture & Pipeline Development
- Design, develop, evaluate, and maintain scalable ETL/ELT data pipelines.
- Build and manage structured and unstructured data workflows to move Early Childhood program data from source systems to warehouses, data lakes, and analytics platforms.
- Develop scalable data ingestion and transformation processes to support increasing data volumes and complex analytics needs.
- Optimize data partitioning, indexing strategies, compression techniques, and distributed processing frameworks to improve storage and query performance.
Cloud & Infrastructure Management
- Design and maintain cloud-based data platforms (e.g., AWS, Azure, Google Cloud, IBM Cloud).
- Manage data warehouses (e.g., BigQuery, Azure Synapse), data lakes (e.g., Amazon S3, Google Cloud Storage), lakehouses, and related infrastructure.
- Collaborate on infrastructure management, database administration, and modern data architecture approaches including data mesh and cloud-native solutions.
Data Quality, Governance & Compliance
- Ensure data accuracy, consistency, reliability, and performance.
- Enforce compliance with FERPA, HIPAA, GDPR, COPPA, and other state/federal data governance requirements.
- Develop and implement industry-standard data security practices including encryption, access controls, breach notification protocols, and audit readiness.
- Implement validation rules, anomaly detection, monitoring frameworks, and quality assurance automation.
Tools & Technical Expertise
- Develop and optimize data processing using SQL for transformation and querying.
- Use Python for scripting, automation, and data engineering tasks.
- Leverage Databricks and distributed computing frameworks for scalable data processing.
- Utilize tools such as Airflow, DBT, Splunk, Tableau, and Power BI for orchestration, monitoring, validation, and visualization.
- Implement CI/CD pipelines, version control strategies, automated testing, and pipeline orchestration best practices.
Integration & Advanced Analytics
- Direct integration of data from APIs, third-party systems, and internal platforms.
- Support machine learning model deployment and real-time data processing environments.
- Enable advanced analytics and operationalize data science initiatives to improve decision-making, efficiency, and risk mitigation.
Reporting & Stakeholder Communication
- Coordinate development of dashboards, reports, publications, briefings, and presentations.
- Ensure timely and accurate submission of mandated state and federal reports.
- Communicate complex technical findings to both technical and non-technical audiences.
- Promote interactive data exploration to improve transparency, accountability, and strategic decision-making.
Documentation & Process Improvement
- Develop and maintain detailed documentation of data pipelines, architecture, workflows, and procedures.
- Establish industry-standard best practices for data engineering, governance, automation, and reproducibility.
- Lead continuous improvement initiatives to enhance reliability, scalability, and operational efficiency.
Leadership & Collaboration
- Lead and manage collaboration within the Data Engineering section.
- Partner with leadership, analysts, data scientists, and engineers to build scalable, trusted data systems.
- Provide technical oversight, mentorship, and strategic direction for data engineering initiatives.
Required Qualifications
- 5+ years of experience designing and maintaining scalable data pipelines and modern data infrastructure.
- Strong proficiency in SQL and Python.
- Experience with distributed data engineering tools such as Databricks or Spark.
- Experience with cloud platforms (AWS, Azure, Google Cloud, or similar).
- Deep understanding of data governance, security, compliance, and regulatory requirements.
- Experience optimizing storage, partitioning strategies, and query performance.
- Knowledge of ETL/ELT methodologies and orchestration tools.
Preferred Qualifications
- Experience working in State government or education data systems
- Experience with Databricks, Airflow, DBT, or similar data orchestration tools
- Experience supporting machine learning deployment in production
- Experience with real-time data processing frameworks