Data Engineering and Integration Lead
Long Term Contract
Responsibilities and Activities
- Lead a small team of technical experts to plan, execute and manage the integration of data across the enterprise’s data assets.
- Build data pipelines: Managed data pipelines consist of a series of stages through which data flows (for example, from data sources or endpoints of acquisition to integration to consumption for specific use cases). These data pipelines have to be created, maintained and optimized as workloads move from development to production for specific use cases. Architecting, building, and maintaining data pipelines in collaboration with other technical staff, will be one of the primary responsibilities of this role.
- Drive Automation of processes: The lead will be responsible for driving the use of innovative and modern tools, techniques and architectures to partially or completely automate the most-common, repeatable and tedious data preparation and integration tasks in order to minimize manual and error-prone processes and improve productivity.
This will include but not be limited to:
- Learning and using modern data preparation and integration tools and techniques.
- Tracking data consumption patterns.
- Monitoring schema changes.
- Designing and recommending — or sometimes even automating — existing and future integration flows.
- Collaborate across business and IT: They will need strong collaboration skills in order to work with varied stakeholders within the organization. In particular, they will work closely with Data/IT architects, MDOT’s Chief Information Steward, business SMEs, and data analysts, to define data integration requirements for various IT projects and data analytics initiatives, and then lead the planning and development of solutions.
- Collaborate with key IT staff to design and model application data structures, storage, and integration in accordance with enterprise-wide architecture standards across legacy, web, cloud and COTS package environments.
- Develop standards, designs, data maps; collaborate with ETL developers to help them develop and modify functions, programs, routines and stored procedures to export, transform and load data.
- Educate and maintain and agency-wide perspective: The data engineering and integration lead should be curious and knowledgeable about new IT project and data initiatives and how to address them. This includes applying their data and/or domain understanding in addressing new data and integration requirements. They will also be responsible for proposing appropriate (and innovative) data ingestion, preparation, integration, and operationalization techniques in optimally addressing these data requirements and ensuring that agency-wide benefits are at the forefront of all proposed solutions. They will be required to train counterparts such as data architects, data analysts, information stewards, and other data consumers, in these data pipelining and preparation techniques, which makes it easier for them to integrate and consume the data they need.
- Participate in ensuring compliance and governance: Build models, pipelines and integration flows that “conform” to policies, rules, and metadata provisioned through glossaries in governance tools or catalogs. Also responsible to ensure that the data engineering and integration team, data users, and consumers build and use the data provisioned to them responsibly through data governance and compliance initiatives. Work with MDOT’s Chief Information Steward to ensure alignment and participate in vetting and promoting data content created by business and by data analysts, to a future curated data catalog for governed reuse.
- Be MDOT’s data integration expert: They will be considered a blend of “data guru” and “fixer.” This role will promote the available data consumption capabilities and expertise to IT and business areas, lead the creation of new capabilities where needed, and educate key staff to leverage these capabilities toward achieving MDOT’s agency-wide business goals.
- Track industry trends and recommend enterprise data integrations to accommodate users’ needs and business processes.
A successful candidate will have the education, expertise and skills described below.
Education and Training
- A bachelor's degree in computer science, statistics, applied mathematics, data management, information systems, information science or a related quantitative field is required.
- An advanced degree (MS) in computer science, statistics, applied mathematics, information science (MIS), data management, information systems, or a related quantitative field or equivalent work experience is preferred.
- The ideal candidate will have a combination of data integration and engineering expertise, IT skills, data governance skills, and analytics skills.
- At least six years or more of work experience in data architecture and integration design, and data management disciplines, , data warehousing, Big Data related initiatives, development and implementation of integration pipelines, modeling and optimization, data quality, and/or other areas directly relevant to data engineering and integration responsibilities across IT and Data Analytics projects.
- At least three years of experience leading cross-functional teams and collaborating with business and technical stakeholders to initiate, plan, and execute enterprise wide data architecture strategies, as part of a department-wide and/or multi-departmental data management and/or data analytics initiative.
- Strong experience documenting complex requirements, considering ambiguous information, and engaging cross functionally, to propose elegant designs that require minimal re-work. Must be able to summarize complex thoughts into understandable documents and schematic diagrams to promote common understanding and goals. Must be experienced with thorough impact analysis of design changes that includes with documentation to support options with pros and cons.
- Foundational knowledge of Data Management practices –
- Strong experience with various Data Management architectures like Data Warehouse, Data Lake, Data Hub, Operational Data Stores, and the supporting processes like Data Integration, Governance, Metadata Management.
- Strong ability to design, build and manage data pipelines for data structures encompassing data transformation, data models, schemas, metadata and workload management.
- Strong experience in working with and optimizing existing ETL processes and data integration and data preparation flows and helping to move them in production.
- Strong experience in working with large, heterogeneous datasets in building and optimizing data pipelines, pipeline architectures and integrated datasets using traditional data integration technologies. These should include ETL/ELT, data replication/CDC, message-oriented data movement, API design and access and upcoming data ingestion and integration technologies such as stream data integration, CEP and data virtualization.
- Experience working with data governance/data quality and data security teams and specifically information stewards and privacy and security officers in moving data pipelines into production with appropriate data quality, governance and security standards and certification. Ability to build quick prototypes and to translate prototypes into data products and services in a diverse ecosystem.
- Demonstrated success in working with large, heterogeneous datasets to extract business value using popular data preparation tools such as Trifacta, Paxata, Unifi, or Alteryx to reduce or even automate parts of the tedious data preparation tasks.
- Strong experience with popular database programming languages including SQL, PL/SQL, others for relational databases and certifications on upcoming NoSQL/Hadoop oriented databases like MongoDB, Cassandra, others for nonrelational databases.
- Strong experience in working with SQL on Hadoop query languages and tools including HIVE, Impala, Presto, and others from an open source perspective and Informatica, Talend, Hortonworks Data Flow (HDF), Dremio, and others from a commercial vendor perspective.
- Knowledge of and experience with multiple data integration platforms including Oracle Data Integrator, Informatica Platform, Talend, MS SQL Server, IBM InfoSphere, and Data warehouse MPP applications such Netezza, Teradata, etc.
- Strong experience with advanced analytics tools for Object-oriented/object function scripting using languages such as R, Python, Java, C++, Scala, and others.
- Strong experience in working with both open-source and commercial message queuing technologies such as Kafka, JMS, Azure Service Bus, Amazon Simple queuing Service, and others, stream data integration technologies such as Apache Nifi, Apache Beam, Apache Kafka Streams, Amazon Kinesis, and stream analytics technologies such as Apache Kafka KSQL Apache Spark Streaming Apache Samza, etc.
- Knowledge about various architectures, patterns and protocols such as SSID, SSIS, ODBC, JDBC, unified data management architecture (UDM), event-driven architecture, real-time data flows, non-relational repositories, data virtualization, cloud enablement, etc.
- Ability to automate pipeline development –
o Strong experience in working with DataOps enabling capabilities like version control, automated builds, testing and release management capabilities using tools like Git, Jenkins, Puppet, Ansible.
- Ability to collaborate with technical and business personnel –
- Strong experience in working with data science teams in refining and optimizing data science and machine learning models and algorithms.
- Demonstrated success in working with both IT and business while integrating analytics and data science output into business processes and workflows.
- Basic understanding of data analytics and data science platforms –
- Basic experience working with popular data discovery, analytics and BI software tools like Tableau, Qlik, PowerBI and others for semantic-layer-based data discovery.
- Basic understanding of popular open-source and commercial data science platforms such as Python, R, KNIME, Alteryx, and others is a strong plus but not required.
- Exposure to hybrid deployments: Cloud and On-premise –
- Demonstrated ability to work across multiple deployment environments including cloud, on-premises and hybrid, multiple operating systems and through containerization techniques such as Docker, Kubernetes, AWS Elastic Container Service and others.
- Adept in agile methodologies and capable of applying DevOps and increasingly DataOps principles to data pipelines to improve the communication, integration, scalability, reuse and automation of data flows between data managers and consumers across an organization.
- Ability to regularly learn and adopt new technology.
- Experience working with Geospatial data and systems would be a plus, but not required.
- Transportation industry knowledge or previous experience working in the field would be a plus.
Interpersonal Skills and Characteristics
- Strong experience supporting and working with cross-functional teams in a dynamic business environment.
- Required to provide well thought out solutions, with clear written and spoken communication, to ensure a common understanding of AS-IS and TO-BE states
- Required to be highly creative and collaborative. An ideal candidate would be expected to collaborate with both the business and IT teams to define the business problem, refine the requirements, and design and develop data deliverables accordingly. The successful candidate will also be required to have regular discussions with data SMEs on optimally defining and building and the data pipelines developed in nonproduction environments and deploying them in production.
- Required to have the accessibility and ability to interface with, and gain the respect of, stakeholders at all levels and roles within the company.
- Is a confident, energetic self-starter, with strong interpersonal skills.
- Has good judgment, a sense of urgency and has demonstrated commitment to high standards of ethics, regulatory compliance, customer service and business integrity.
Metrics for Evaluation
- Outlining approach and leading the data engineering and integration team toward on-time and on-scope delivery of data pipelines in alignment with business needs for IT projects, and/or data analytics efforts.
- Ongoing support of production service-level metrics across production data pipelines.
- Ability to collaborate with business teams to uncover new requirements for integrated data for upcoming use case delivery and gaining business skills from and passing data management skills to citizen integrators and power users within various business teams.
- Ability to network and be recognized as a data expert across a variety of business and IT areas.
- Effectiveness in leading and educating on data issues as measured by the delivery of dedicated internal sessions to various audiences, technical and non-technical.
Travel/Telecommuting: Remote work through 12/31/2020; beyond that is TBD – in-person may be required, which would entail working in downtown Lansing, MI.