Location: 100% Remote
Years’ Experience: 5+ years Professional Experience
Education: Bachelor’s Degree in IT related field
Clearance: Applicants must be able to obtain and maintain a secret security clearance. United States Citizenship is required as part of the eligibility criteria to be able to obtain this type of security clearance.
Required Certifications:
· CompTIA Security +
Key Skills:
- 5+ years of IT experience focusing on enterprise data architecture and management to include data flow charts, diagrams, and other technical documentation.
- Experience with Databricks, Structured Streaming, Delta Lake concepts, and Delta Live Tables required.
- Python development experience required.
- Experience with ETL and ELT tools such as SSIS, Pentaho, and/or Data Migration Services, and the ability to incorporate Python as required.
- Advanced level SQL experience (Joins, Aggregation, Windowing functions, Common Table Expressions, RDBMS schema design, Postgres performance optimization).
- Proficiency using Git for version control, including repository management, branching, merging, and pull requests.
- Active CompTIA Security+ certification preferred. If selected, must be able to obtain a CompTIA Security+ certification prior to beginning supporting the program.
Responsibilities
· Plan, create, and maintain data architectures, ensuring alignment with business requirements.
· Obtain data, formulate dataset processes, and store optimized data.
· Identify problems and inefficiencies and apply solutions.
· Determine tasks where manual participation can be eliminated with automation.
· Identify and optimize data bottlenecks, leveraging automation where possible.
· Create and manage data lifecycle policies (retention, backups/restore, etc).
· In-depth knowledge for creating, maintaining, and managing ETL/ELT pipelines.
· Create, maintain, and manage data transformations.
· Maintain/update documentation.
· Create, maintain, and manage data pipeline schedules.
· Monitor data pipelines.
· Create, maintain, and manage data quality gates (Great Expectations) to ensure high data quality.
· Support AI/ML teams with optimizing feature engineering code.
· Expertise in Spark/Python/Databricks, Data Lake and SQL.
· Create, maintain, and manage Spark Structured Steaming jobs, including using the newer Delta Live Tables and/or DBT.
· Research existing data in the data lake to determine best sources for data.
· Create, manage, and maintain ksqlDB and Kafka Streams queries/code
· Data driven testing for data quality.
· Maintain and update Python-based data processing scripts executed on AWS Lambdas.
· Unit tests for all the Spark, Python data processing and Lambda codes.
· Maintain PCIS Reporting Database data lake with optimizations and maintenance (performance tuning, etc).
· Streamlining data processing experience including formalizing concepts of how to handle lake data, defining windows, and how window definitions impact data freshness.
Qualifications
· 5+ years of IT experience focusing on enterprise data architecture and management.
· Must have an active Secret security clearance.
· Bachelor degree required.
· CompTIA Security+ certification preferred. If selected, must be able to obtain a CompTIA Security+ certification prior to begin supporting the program.
· Experience in Conceptual/Logical/Physical Data Modeling & expertise in Relational and Dimensional Data Modeling.
· Experience with Databricks and Python Development, Structured Streaming, Delta Lake concepts, and Delta Live Tables required.
o Additional experience with Spark, Spark SQL, Spark DataFrames and DataSets, and PySpark.
o Data Lake concepts such as time travel and schema evolution and optimization.
o Structured Streaming and Delta Live Tables with Databricks a bonus.
· Knowledge of Python (Python 3.X) for CI/CD pipelines required.
o Familiarity with Pytest and Unittest a bonus.
· Experience leading and architecting enterprise-wide initiatives specifically system integration, data migration, transformation, data warehouse build, data mart build, and data lakes implementation / support.
o Advanced level understanding of streaming data pipelines and how they differ from batch systems.
o Formalize concepts of how to handle late data, defining windows, and data freshness.
o Advanced understanding of ETL and ELT and ETL/ELT tools such as SSIS, Pentaho, Data Migration Service etc.
o Understanding of concepts and implementation strategies for different incremental data loads such as tumbling window, sliding window, high watermark, etc.
o Familiarity and/or expertise with Great Expectations or other data quality/data validation frameworks a bonus.
o Understanding of streaming data pipelines and batch systems.
o Familiarity with concepts such as late data, defining windows, and how window definitions impact data freshness.
· Advanced level SQL experience (Joins, Aggregation, Windowing functions, Common Table Expressions, RDBMS schema design, Postgres performance optimization).
· Indexing and partitioning strategy experience.
· Debug, troubleshoot, design and implement solutions to complex technical issues.
· Experience with large-scale, high-performance enterprise big data application deployment and solution.
· Understanding how to create DAGs to define workflows.
· Familiarity with CI/CD pipelines, containerization, and pipeline orchestration tools such as Airflow, Prefect, etc a bonus but not required.
· Architecture experience in AWS environment a bonus.
o Familiarity working with Kinesis and/or Lambda specifically with how to push and pull data, how to use AWS tools to view data in Kinesis streams, and for processing massive data at scale a bonus.
o Experience with Docker, Jenkins, and CloudWatch.
o Ability to write and maintain Jenkinsfiles for supporting CI/CD pipelines.
o Experience working with AWS Lambdas for configuration and optimization.
o Experience working with DynamoDB to query and write data.
o Experience with S3.
· Experience working with JSON and defining JSON Schemas a bonus.
· Experience setting up and management Confluent/Kafka topics and ensuring performance using Kafka a bonus.
o Familiarity with Schema Registry, message formats such as Avro, ORC, etc.
o Understanding how to manage ksqlDB SQL files and migrations and Kafka Streams.
· Ability to thrive in a team-based environment.
· Experience briefing the benefits and constraints of technology solutions to technology partners, stakeholders, team members, and senior level of management.
· Proficiency using Git for version control, including repository management, branching, merging, and pull requests.
o Repository setup and management.
o Branching strategies (feature, develop, main).
o Merging and resolving conflicts.
o Creating and reviewing pull requests.
o Commit best practices (clear messages, atomic commits).
o Tagging and release management.