Data Engineer (ETL/ AWS/ Python/ SQL)

Overview

Remote

Accepts corp to corp applications

Contract - W2

Skills

AWS

ETL

Python

SQL

AWS Lambda

Sage maker

AWS Glue

Job Details

As a Data Engineer and you will design and maintain the data infrastructure that powers Wi-AI Core ? our real-time wireless sensing platform that uses Wi-Fi signals to detect and characterize people, objects, and threats in real environments.

You will architect scalable AWS-based data systems that collect, synchronize, and transform multi-modal data streams (Wi-Fi CSI, video, telemetry) from hundreds of distributed sensing devices. Your work will ensure our machine learning and research teams have clean, versioned, and reliable datasets for training and continuous model improvement.

Duties and Responsibilities

Cloud & Data Infrastructure (AWS Stack)

Design and maintain event-driven data pipelines using AWS Step Functions, EventBridge, Lambda, and ECS/Fargate.
Develop scalable ingestion workflows to move device data (CSI, video, metadata) into S3 with proper partitioning, validation, and schema management.
Implement ETL and transformation jobs using AWS Glue or containerized Python pipelines.
Integrate SageMaker Pipelines or Step Functions to automate model-training and evaluation workflows.

Data Modeling & Versioning

Define and enforce consistent data schemas, validation logic, and metadata standards for Wi-AI sensor data.
Manage dataset versioning and lineage using Glue Data Catalog, DVC, or equivalent versioning frameworks.
Build data summarization and indexing layers to efficiently query multi-terabyte datasets.

Data Quality & Observability

Implement automated quality checks, anomaly detection, and completeness tracking across data streams.
Build dashboards and metrics to monitor data ingestion, latency, and device health (e.g., via CloudWatch or Grafana).

Collaboration & Integration

Work closely with ML engineers and scientists to design reproducible dataset interfaces and feature pipelines.
Support field and RF teams by ensuring new devices, environments, and firmware revisions integrate seamlessly into the data ecosystem.
Contribute to internal tools that make it easy to search, visualize, and retrieve relevant subsets of multi-modal data.

Performance & Scalability

Optimize data transfer, caching, and compute scheduling for high-throughput processing jobs.
Apply best practices in serverless architecture, cost optimization, and security (IAM, KMS, VPC design).

Must Haves

Bachelor?s or Master?s degree in Computer Science, Data Engineering, or Electrical Engineering.
3+ years of experience designing and maintaining data pipelines or cloud infrastructure on AWS.
Deep experience with AWS services including S3, Lambda, ECS/Fargate, Step Functions, Glue, CloudWatch, and IAM.
Strong Python development skills, including experience with Pandas, NumPy, or PySpark.
Proficiency in SQL and experience with schema design for large-scale analytical datasets (Parquet, Arrow, HDF5, etc.).
Understanding of event-driven architectures, serverless workflows, and data quality validation.
Familiarity with CI/CD and infrastructure-as-code tools (Terraform, CloudFormation, or CDK).

Nice to Haves

Experience with multi-modal sensor data (e.g., video + time-series or IoT telemetry).
Background in RF, signal processing, or embedded sensing systems.
Familiarity with SageMaker, MLflow, or Weights & Biases for model-training pipelines.
Knowledge of data orchestration frameworks (Prefect, Dagster, or Airflow) for hybrid pipelines.
Experience with monitoring and observability stacks (Grafana, Prometheus, or OpenTelemetry).

Location: Bridgeville, PA

(Hybrid role ? occasional on-site collaboration at our research and test facilities.)

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

Share