Data Pipeline Engineer (ML)

Overview

On Site

$40 - $56

Full Time

Skills

ELT

Extract

Transform

Load

Machine Learning Operations (ML Ops)

Machine Learning (ML)

Python

Job Details

Our client is scaling production ML systems and needs a hands-on engineer to help build, maintain, and run essential ML data pipelines. You ll own high-throughput data ingestion and transformation workflows (including image- and array-type modalities), enforce rigorous data quality standards, and partner with research and platform teams to keep models fed with reliable, versioned datasets.

Design, build, and operate reliable ML data pipelines for batch and/or streaming use cases across cloud environments.
Develop robust ETL/ELT processes (ingest, validate, cleanse, transform, and publish) with clear SLAs and monitoring.
Implement data quality gates (schema checks, null/outlier handling, drift and bias signals) and data versioning for reproducibility.
Optimize pipelines for distributed computing and large modalities (e.g., images, multi-dimensional arrays).
Automate repetitive workflows with CI/CD and infrastructure-as-code; document, test, and harden for production.
Collaborate with ML, Data Science, and Platform teams to align datasets, features, and model training needs.

Minimum Qualifications:

5+ years building and operating data pipelines in production.

Cloud: Hands-on with AWS, Azure, or Google Cloud Platform services for storage, compute, orchestration, and security.
Programming: Strong proficiency in Python and common data/ML libraries (pandas, NumPy, etc.).
Distributed compute: Experience with at least one of Spark, Dask, or Ray.
Modalities: Experience handling image-type and array-type data at scale.
Automation: Proven ability to automate repetitive tasks (shell/Python scripting, CI/CD).
Data Quality: Implemented validation, cleansing, and transformation frameworks in production.
Data Versioning: Familiar with tools/practices such as DVC, LakeFS, or similar.
Languages: Fluent in English or Farsi.
Strongly PreferredSQL expertise (writing performant queries; optimizing on large datasets).
Data warehousing/lakehouse concepts and tools (e.g., Snowflake/BigQuery/Redshift; Delta/Lakehouse patterns).
Data virtualization/federation exposure (e.g., Presto/Trino) and semantic/metadata layers.
Orchestration (Airflow, Dagster, Prefect) and observability/monitoring for data pipelines.
MLOps practices (feature stores, experiment tracking, lineage, artifacts).
Containers & IaC (Docker; Terraform/CloudFormation) and CI/CD for data/ML workflows.
Testing for data/ETL (unit/integration tests, great_expectations or similar).
Soft SkillsExecutes independently and creatively; comfortable owning outcomes in ambiguous environments.
Proactive communicator who collaborates cross-functionally with DS/ML/Platform stakeholders.

Location: Seattle, WA

Duration: 1+ year

Pay: $56/hr

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

About OSI Engineering, Inc.

Share