Overview
Skills
Job Details
Spark/ Glue Developer
Project Name | Shared Data Platform (SDP) |
Client | State of Maryland |
Agency | Maryland Benefits |
Location | 100% On-site Mon-Fri, Linthicum Heights, MD 21090 |
Interview Type | In-Person |
Contract Duration | 1 year with 9 one-year renewal options |
Tentative Start Date | 10/01/2025 |
Deadline | 09/22/2025 |
Project Overview:
Innosoft is the prime contractor for MD Benefits (formerly MD THINK), supporting the management, design, development, testing, and implementation of this strategic Information Technology (IT) program. Maryland Benefits is seeking an agile development resource team with required skill sets to build and/or maintain the Maryland Benefits infrastructure/platform, applications development, data repositories, reports and dashboards and support activities related to network services and system operations.
The Shared Data Platform (SDP) is designed as a cloud-based, data-centric infrastructure to support scalable, flexible, and integrated data operations. It empowers self-service and accelerates data-driven decision-making across the enterprise. Key strategic goals include establishing a mature data infrastructure that balances analytics and business intelligence, enabling iterative learning and actionable insights, and fostering a "Data Center of Excellence" (DCoE) to govern and enhance data-driven processes. The SDP also prioritizes the delivery of trusted information and the streamlined onboarding of State Agencies to the Maryland Benefits platform through standardized procedures, ultimately driving operational efficiency and measurable business value. The "Data Platform - Automation" team plays a critical role in achieving these goals by supporting the "Data Platform - Engineering" team.
Duties/Responsibilities:
The Spark/Glue Developer shall utilize IT equipment and languages (third- and fourth-generation or current state-of-the-art) to develop and prepare diagrammatic plans to solve business, management, communications, and strategic problems. This individual shall design detailed programs, flowcharts, and diagrams showing mathematical computations and sequence of machine operations necessary to copy and process data and print results. This individual shall verify the accuracy and completeness of programs and systems by preparing sample representative data, and perform testing by means of cycle and system processing.
- Design, develop, and optimize ETL/ELT pipelines using AWS Glue (PySpark) to ingest, transform, and load data from various sources (RDBMS, APIs, files) into S3-based Data Lake and Redshift/Snowflake.
- Implement complex business logic, data cleansing, and enrichment using PySpark.
- Handle both batch and streaming data processing use cases using AWS services such as Kinesis, Glue Streaming, or Spark Structured Streaming.
- Contribute to the implementation and optimization of the Lake House pattern, integrating data lakes (S3, Lake Formation) with data warehouses (Redshift, Snowflake).
- Leverage Glue Catalog, Lake Formation, and Athena to enable data discovery and secure data sharing.
- Work with data architects, data analysts, and DevOps teams to design end-to-end data workflows.
- Collaborate on building reusable code modules, job orchestration using Step Functions, Airflow, or AWS Glue Workflows.
- Optimize Spark/Glue jobs for performance and cost.
- Implement and maintain data quality checks, logging, and error-handling mechanisms.
- Ensure compliance with data governance, privacy, and security policies.
Requirements
Education:
This position requires a Bachelor?s degree from an accredited college or university with a major in computer science, information systems, engineering, business, or a related scientific or technical discipline; or three (3) years of equivalent experience in a related field. (Note: A Master?s degree is preferred.)
General Experience:
- The proposed candidate must have at least eight (8) years of programming experience in software development or maintenance.
- Eight-plus (8+) years of experience in data engineering, with strong expertise in Apache Spark and PySpark.
- Hands-on experience with AWS Glue (v2 or v3), AWS S3, AWS Lake Formation, and Athena.
- Strong knowledge of SQL, partitioning strategies, and data modeling for analytical systems.
- Experience with CI/CD practices, Git, and infrastructure-as-code (Terraform, CloudFormation).
- Experience with Redshift, Snowflake, or similar cloud data warehouses.
- Familiarity with Airflow, Step Functions, or other orchestration tools.
- Knowledge of data governance frameworks, IAM permissions, and Lake Formation fine-grained access controls.
- Exposure to Glue Studio, Glue DataBrew, or AWS DataZone.
Specialized Experience:
- The proposed candidate must have at least five (5) years of experience in IT systems analysis and programming.
- Strong analytical and problem-solving skills. Ability to work independently and in cross-functional agile teams. Clear communication and documentation abilities.
- Key qualifications include strong project management skills, effective communication, and the capacity to identify and resolve issues while adapting to changing circumstances. Significant experience collaborating with cross-functional teams, product owners, and stakeholders is essential
- Preferred qualifications include a technical background and experience in data and AI/ML projects. Functional experience with local or government projects in Data Services (MDM/R360/EDMS), Child Support, Integrated Eligibility, Child Welfare, Adult Protective Services, Juvenile Justice, or Health and Human Services is also preferred.