ML Data Infrastructure Engineer

  • Posted 20 days ago | Updated 20 days ago

Overview

Remote
$50 - $55
Contract - W2
Contract - 12 Month(s)

Skills

CI/CD
GitOps
BigQuery
Spark
Beam
Flink

Job Details

Role: ML Data Infrastructure Engineer

Location: Sunnyvale, CA OPEN FOR 100% REMOTE

Duration: 12 Months

Candidates should have strong experience in Deep Leaning ML frameworks and model serving technologies which is lacking in most of the resumes.

JD:

8+ years of software engineering experience, with 3+ years in ML serving/infrastructure\

Strong expertise in container orchestration (Kubernetes) and cloud platforms

Experience with model serving technologies (TensorFlow Serving, Triton, KServe)

Deep knowledge of distributed systems and microservices architecture

Proficiency in Python and experience with high-performance serving

Strong background in monitoring and observability tools

Experience with CI/CD pipelines and GitOps workflows

Key Responsibilities:

  • Design and implement scalable data processing pipelines for ML training and validation
  • Build and maintain feature stores with support for both batch and real-time features
  • Develop data quality monitoring, validation, and testing frameworks
  • Create systems for dataset versioning, lineage tracking, and reproducibility
  • Implement automated data documentation and discovery tools
  • Design efficient data storage and access patterns for ML workloads
  • Partner with data scientists to optimize data preparation workflows

Technical Requirements:

  • 7+ years of software engineering experience, with 3+ years in data infrastructure
  • Strong expertise in Google Cloud Platform's data and ML infrastructure:
    • BigQuery for data warehousing
    • Dataflow for data processing
    • Cloud Storage for data lakes
    • Vertex AI Feature Store
    • Cloud Composer (managed Airflow)
    • Dataproc for Spark workloads
  • Deep expertise in data processing frameworks (Spark, Beam, Flink)
  • Experience with feature stores (Feast, Tecton) and data versioning tools
  • Proficiency in Python and SQL
  • Experience with data quality and testing frameworks
  • Knowledge of data pipeline orchestration (Airflow, Dagster)

Nice to Have:

  • Experience with streaming systems (Kafka, Kinesis)
  • Experience with Google Cloud Platform-specific security and IAM best practices
  • Knowledge of Cloud Logging and Cloud Monitoring for data pipelines
  • Familiarity with Cloud Build and Cloud Deploy for CI/CD
  • Experience with streaming systems (Pub/Sub, Dataflow)
  • Knowledge of ML metadata management systems
  • Familiarity with data governance and security requirements
  • Experience with dbt or similar data transformation tools
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About DVARN