Overview
On Site
Full Time
Skills
Durable Skills
Python
Terraform
FOCUS
Apache Kafka
Amazon S3
Storage
Orchestration
Data Lake
JSON
YAML
ELT
SQL
Version Control
Git
Data Engineering
Cloud Computing
Data Warehouse
Amazon Web Services
Amazon Redshift
Snow Flake Schema
Extract
Transform
Load
Data Quality
Testing
Data Validation
Performance Tuning
Apache HTTP Server
Management
Slack
JIRA
Analytics
Documentation
Adaptability
Streaming
Problem Solving
Conflict Resolution
Analytical Skill
Communication
Collaboration
Job Details
Essential Skills
Proficiency in Python: Must be proficient in designing classes, working in Terraform, and using Git for code reviews and collaborative development.
Kafka Expertise: Strong experience with Apache Kafka, including setting up consumers and S3 sinks for topics, with a focus on streaming data pipelines; the more hands-on Kafka experience, especially with S3 integration, the better.
AWS Services: General experience with Amazon MSK (Managed Streaming for Apache Kafka), Glue for ETL processes, S3 for storage, ECS for container orchestration, and related services to build scalable data infrastructure.
Data Lake Architectures: Strong understanding of working with semi-structured data and modern data lake architectures, including handling raw data in formats like JSON and YAML.
Data Pipeline Development: Proven ability to design and implement ETL/ELT processes for data ingestion, transformation, and loading, including incremental loads, data quality checks, and integration with streaming sources.
SQL Mastery: Advanced SQL skills for querying, transforming, and optimizing data across data warehouses and lakes (e.g., Redshift, Snowflake, or similar).
Version Control: Comfort with Git for collaborative development, branching, and merging data engineering projects.
Desirable Skills:
Cloud Data Platforms: Experience with cloud data warehouses and lakes (e.g., AWS Redshift, Snowflake, or Google BigQuery) to support data pipeline deployment and management.
Data Quality and Testing: Ability to implement data validation strategies, tests (e.g., uniqueness, referential integrity), and monitoring in pipelines to ensure reliability.
Performance Optimization: Skills in optimizing data pipelines and queries for large datasets, including partitioning, indexing, and leveraging formats like Apache Iceberg for scalable table management (nice to have).
Collaboration Tools: Proficiency with tools like Slack or Jira for coordinating with Analytics Engineers (AEs) and tracking progress.
Documentation: Capability to create clear documentation for data pipelines, schemas, and integration processes to support team handoff.
Additional Considerations
Adaptability: Ability to quickly learn existing data infrastructure and adapt pipelines to incorporate new streaming sources, data lakes, or raw assets.
Problem-Solving: Strong analytical skills to troubleshoot pipeline challenges, such as data inconsistencies in semi-structured formats, and propose effective solutions.
Communication: Comfort working with AEs to understand requirements and provide updates, ensuring smooth collaboration.
TSG is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.
#LI-KY1
73137
Proficiency in Python: Must be proficient in designing classes, working in Terraform, and using Git for code reviews and collaborative development.
Kafka Expertise: Strong experience with Apache Kafka, including setting up consumers and S3 sinks for topics, with a focus on streaming data pipelines; the more hands-on Kafka experience, especially with S3 integration, the better.
AWS Services: General experience with Amazon MSK (Managed Streaming for Apache Kafka), Glue for ETL processes, S3 for storage, ECS for container orchestration, and related services to build scalable data infrastructure.
Data Lake Architectures: Strong understanding of working with semi-structured data and modern data lake architectures, including handling raw data in formats like JSON and YAML.
Data Pipeline Development: Proven ability to design and implement ETL/ELT processes for data ingestion, transformation, and loading, including incremental loads, data quality checks, and integration with streaming sources.
SQL Mastery: Advanced SQL skills for querying, transforming, and optimizing data across data warehouses and lakes (e.g., Redshift, Snowflake, or similar).
Version Control: Comfort with Git for collaborative development, branching, and merging data engineering projects.
Desirable Skills:
Cloud Data Platforms: Experience with cloud data warehouses and lakes (e.g., AWS Redshift, Snowflake, or Google BigQuery) to support data pipeline deployment and management.
Data Quality and Testing: Ability to implement data validation strategies, tests (e.g., uniqueness, referential integrity), and monitoring in pipelines to ensure reliability.
Performance Optimization: Skills in optimizing data pipelines and queries for large datasets, including partitioning, indexing, and leveraging formats like Apache Iceberg for scalable table management (nice to have).
Collaboration Tools: Proficiency with tools like Slack or Jira for coordinating with Analytics Engineers (AEs) and tracking progress.
Documentation: Capability to create clear documentation for data pipelines, schemas, and integration processes to support team handoff.
Additional Considerations
Adaptability: Ability to quickly learn existing data infrastructure and adapt pipelines to incorporate new streaming sources, data lakes, or raw assets.
Problem-Solving: Strong analytical skills to troubleshoot pipeline challenges, such as data inconsistencies in semi-structured formats, and propose effective solutions.
Communication: Comfort working with AEs to understand requirements and provide updates, ensuring smooth collaboration.
TSG is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.
#LI-KY1
73137
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.