Overview
Skills
Job Details
Hi There ,
We are looking for an experienced MLOps Developer with deep expertise in Python, Pandas, PySpark, PyArrow, and Hadoop to join our team focused on optimizing feature engineering pipelines and maintaining machine learning models in production. This role is centered on operational excellence not model development and involves working with existing frameworks and platform teams to ensure scalable, reliable, and performant ML workflows.
W2 candidates local to Pittsburgh, PA only.
Key Responsibilities:
  Optimize and maintain large-scale feature engineering jobs using PySpark, Pandas, and PyArrow on Hadoop-based infrastructure.
  Refactor and modularize ML codebases to improve reusability, maintainability, and performance.
  Collaborate with platform teams to manage compute capacity, resource allocation, and system updates.
  Integrate with existing Model Serving Framework to support testing, deployment, and rollback of ML workflows.
  Monitor and troubleshoot production ML pipelines, ensuring high reliability, low latency, and cost efficiency.
  Contribute to internal Model Serving Framework by sharing insights, proposing and implementing improvements, and documenting best practices.
  (Nice to Have) Experience implementing near real-time ML pipelines using Kafka and Spark Streaming for low-latency use cases. Experience with aws and the sagemaker MLOPs ecosystem.
Required Qualifications:
  5+ years of experience in software engineering, data engineering, or MLOps roles.
  Expert-level proficiency in Python, with strong experience in Pandas, PySpark, and PyArrow.
  Expert-level proficiency in Hadoop ecosystem, distributed computing, and performance tuning.
  Experience with CI/CD tools and best practices in ML environments.
  Experience with monitoring tools and techniques for ML pipeline health and performance.
  Strong collaboration skills, especially in cross-functional environments involving platform and data science teams.
Preferred Qualifications:
  Experience contributing to internal MLOps frameworks or platforms.
  Familiarity with SLURM clusters or other distributed job schedulers.
  Exposure to Kafka, Spark Streaming, or other real-time data processing tools.
  Knowledge of model lifecycle management, including versioning, deployment, and drift detection.