Data Engineer || Houston

Overview

On Site

$50 - $55

Contract - W2

Contract - 12 Month(s)

Skills

Amazon S3

Analytics

Apache Hive

Apache Parquet

Apache Spark

Business Intelligence

Caching

Cloud Computing

Cloudera

Clustering

Collaboration

Computerized System Validation

Continuous Delivery

Continuous Integration

Data Extraction

Data Quality

Data Validation

Documentation

ELT

File Formats

GitHub

GitLab

Kubernetes

Macros

Management

Microsoft Power BI

Migration

Orchestration

Promotions

Python

RBAC

Reference Data

SQL

Snow Flake Schema

Storage

Streaming

Time Series

Version Control

Warehouse

Acceptance Testing

Job Details

Delivers the Palantir Foundry exit on a modern Snowflake stack by building reliable, performant, and testable ELT pipelines; recreates Foundry transformations and rule-based event logic; and ensures historical data extraction, reconciliation, and cutover readiness.

Years of Experience:

7+ years overall; 3+ years hands-on with Snowflake.

Key Responsibilities:

Extract historical datasets from Palantir (dataset export, parquet) to S3/ADLS and load into Snowflake; implement checksum and reconciliation controls.
Rebuild Foundry transformations as dbt models and/or Snowflake SQL; implement curated schemas and incremental patterns using Streams and Tasks.
Implement the batch event/rules engine that evaluates time-series plus reference data on a schedule (e.g., 30 60 minutes) and produces auditable event tables.
Configure orchestration in Airflow running on AKS and, where appropriate, Snowflake Tasks, monitor, alert, and document operational runbooks.
Optimize warehouses, queries, clustering, and caching; manage cost with Resource Monitors and usage telemetry.
Author automated tests (dbt tests, Great Expectations or equivalent), validate parity versus legacy outputs, and support UAT and cutover.
Collaborate with BI/analytics teams (Sigma, Power BI) on dataset contracts, performance, and security requirements.

Required Qualifications:

Strong Snowflake SQL and Python for ELT, utilities, and data validation.
Production experience with dbt (models, tests, macros, documentation, lineage).
Orchestration with Airflow (preferably on AKS/Kubernetes) and use of Snowflake Tasks/Streams for incrementals.
Proficiency with cloud object storage (S3/ADLS), file formats (Parquet/CSV), and bulk/incremental load patterns (Snowpipe, External Tables).
Version control and CI/CD with GitHub/GitLab; environment promotion and release hygiene.
Data quality and reconciliation fundamentals, including checksums, row/aggregate parity, and schema integrity tests.
Performance and cost tuning using query profiles, micro-partitioning behavior, and warehouse sizing policies.

Preferred Qualifications:

Experience migrating from legacy platforms (Palantir Foundry, Cloudera/Hive/Spark) and familiarity with Trino/Starburst federation patterns.
Time-series data handling and rules/pattern detection; exposure to Snowpark or UDFs for complex transforms.
Familiarity with consumption patterns in Sigma and Power BI (Import, DirectQuery, composite models, RLS/OLS considerations).
Security and governance in Snowflake (RBAC, masking, row/column policies), tagging, and cost allocation.
Exposure to containerized workloads on AKS, lightweight apps for surfacing data (e.g., Streamlit), and basic observability practices

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

About Value Spectrum Technologies LLC

Share