Data Engineer

Overview

On Site

Full Time

Skills

Durable Skills

Python

Terraform

FOCUS

Apache Kafka

Amazon S3

Storage

Orchestration

Data Lake

JSON

YAML

ELT

SQL

Version Control

Git

Data Engineering

Cloud Computing

Data Warehouse

Amazon Web Services

Amazon Redshift

Snow Flake Schema

Extract

Transform

Load

Data Quality

Testing

Data Validation

Performance Tuning

Apache HTTP Server

Management

Slack

JIRA

Analytics

Documentation

Adaptability

Streaming

Problem Solving

Conflict Resolution

Analytical Skill

Communication

Collaboration

Job Details

Essential Skills
Proficiency in Python: Must be proficient in designing classes, working in Terraform, and using Git for code reviews and collaborative development.
Kafka Expertise: Strong experience with Apache Kafka, including setting up consumers and S3 sinks for topics, with a focus on streaming data pipelines; the more hands-on Kafka experience, especially with S3 integration, the better.
AWS Services: General experience with Amazon MSK (Managed Streaming for Apache Kafka), Glue for ETL processes, S3 for storage, ECS for container orchestration, and related services to build scalable data infrastructure.
Data Lake Architectures: Strong understanding of working with semi-structured data and modern data lake architectures, including handling raw data in formats like JSON and YAML.
Data Pipeline Development: Proven ability to design and implement ETL/ELT processes for data ingestion, transformation, and loading, including incremental loads, data quality checks, and integration with streaming sources.
SQL Mastery: Advanced SQL skills for querying, transforming, and optimizing data across data warehouses and lakes (e.g., Redshift, Snowflake, or similar).
Version Control: Comfort with Git for collaborative development, branching, and merging data engineering projects.
Desirable Skills:
Cloud Data Platforms: Experience with cloud data warehouses and lakes (e.g., AWS Redshift, Snowflake, or Google BigQuery) to support data pipeline deployment and management.
Data Quality and Testing: Ability to implement data validation strategies, tests (e.g., uniqueness, referential integrity), and monitoring in pipelines to ensure reliability.
Performance Optimization: Skills in optimizing data pipelines and queries for large datasets, including partitioning, indexing, and leveraging formats like Apache Iceberg for scalable table management (nice to have).
Collaboration Tools: Proficiency with tools like Slack or Jira for coordinating with Analytics Engineers (AEs) and tracking progress.
Documentation: Capability to create clear documentation for data pipelines, schemas, and integration processes to support team handoff.
Additional Considerations
Adaptability: Ability to quickly learn existing data infrastructure and adapt pipelines to incorporate new streaming sources, data lakes, or raw assets.
Problem-Solving: Strong analytical skills to troubleshoot pipeline challenges, such as data inconsistencies in semi-structured formats, and propose effective solutions.
Communication: Comfort working with AEs to understand requirements and provide updates, ensuring smooth collaboration.

TSG is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.
#LI-KY1

73137

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share