Data Engineer || Houston

Overview

On Site
$50 - $55
Contract - W2
Contract - 12 Month(s)

Skills

Amazon S3
Analytics
Apache Hive
Apache Parquet
Apache Spark
Business Intelligence
Caching
Cloud Computing
Cloudera
Clustering
Collaboration
Computerized System Validation
Continuous Delivery
Continuous Integration
Data Extraction
Data Quality
Data Validation
Documentation
ELT
File Formats
GitHub
GitLab
Kubernetes
Macros
Management
Microsoft Power BI
Migration
Orchestration
Promotions
Python
RBAC
Reference Data
SQL
Snow Flake Schema
Storage
Streaming
Time Series
Version Control
Warehouse
Acceptance Testing

Job Details

Delivers the Palantir Foundry exit on a modern Snowflake stack by building reliable, performant, and testable ELT pipelines; recreates Foundry transformations and rule-based event logic; and ensures historical data extraction, reconciliation, and cutover readiness.

Years of Experience:

  • 7+ years overall; 3+ years hands-on with Snowflake.

Key Responsibilities:

  • Extract historical datasets from Palantir (dataset export, parquet) to S3/ADLS and load into Snowflake; implement checksum and reconciliation controls.
  • Rebuild Foundry transformations as dbt models and/or Snowflake SQL; implement curated schemas and incremental patterns using Streams and Tasks.
  • Implement the batch event/rules engine that evaluates time-series plus reference data on a schedule (e.g., 30 60 minutes) and produces auditable event tables.
  • Configure orchestration in Airflow running on AKS and, where appropriate, Snowflake Tasks, monitor, alert, and document operational runbooks.
  • Optimize warehouses, queries, clustering, and caching; manage cost with Resource Monitors and usage telemetry.
  • Author automated tests (dbt tests, Great Expectations or equivalent), validate parity versus legacy outputs, and support UAT and cutover.
  • Collaborate with BI/analytics teams (Sigma, Power BI) on dataset contracts, performance, and security requirements.

Required Qualifications:

  • Strong Snowflake SQL and Python for ELT, utilities, and data validation.
  • Production experience with dbt (models, tests, macros, documentation, lineage).
  • Orchestration with Airflow (preferably on AKS/Kubernetes) and use of Snowflake Tasks/Streams for incrementals.
  • Proficiency with cloud object storage (S3/ADLS), file formats (Parquet/CSV), and bulk/incremental load patterns (Snowpipe, External Tables).
  • Version control and CI/CD with GitHub/GitLab; environment promotion and release hygiene.
  • Data quality and reconciliation fundamentals, including checksums, row/aggregate parity, and schema integrity tests.
  • Performance and cost tuning using query profiles, micro-partitioning behavior, and warehouse sizing policies.

Preferred Qualifications:

  • Experience migrating from legacy platforms (Palantir Foundry, Cloudera/Hive/Spark) and familiarity with Trino/Starburst federation patterns.
  • Time-series data handling and rules/pattern detection; exposure to Snowpark or UDFs for complex transforms.
  • Familiarity with consumption patterns in Sigma and Power BI (Import, DirectQuery, composite models, RLS/OLS considerations).
  • Security and governance in Snowflake (RBAC, masking, row/column policies), tagging, and cost allocation.
  • Exposure to containerized workloads on AKS, lightweight apps for surfacing data (e.g., Streamlit), and basic observability practices
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Value Spectrum Technologies LLC