Python Data Engineer

Overview

Remote
Depends on Experience
Contract - W2

Skills

Python
Kubernetes
Data Engineering
Data Scientists
Functional Programming
Object - Oriented
Object - Oriented Design
Pandas
Pytest
Scikit - Learn
Unit Testing
Apache Airflow
Dask
Numpy
data collection
data pipelinesExperience
design patterns
distributed computing
machine learning
packaging
relational databases
research
software architecture
software engineering

Job Details

Responsibilities:

Work directly with Business domain experts and Data Scientists to develop high quality, reliable, scalable, machine learning systems

Design and implement frameworks and tools to streamline the machine learning process

Automate manual data collection and processing tasks to improve efficiency

Leverage software architecture and design patterns to develop fault tolerant microservices

Convert research-based machine learning models into production-ready software

Implement processes to ensure coding standards, code quality, documentation, and test coverage

Qualifications

The successful candidate will meet the following qualifications:

7+ years of programming experience in Python

Expertise in developing and maintaining data pipelines

Experience in testing, packaging, and deploying machine learning models

Experience in software engineering practices such as Design Principles and Patterns, Unit Testing, Refactoring, CI/CD, and version control

Expertise in Object-Oriented Design Principals and Functional Programming Principals

Experience with common Python Data Engineering packages including Pandas, Numpy, Pyarrow, Pytest, Scikit-Learn, and Boto3

Experience in storage technologies including SQL relational databases and Object Storage such as AWS S3

Experience in implementing distributed computing systems

Experience in designing modular, reusable software components

Experience in developing API endpoints and microservices

Knowledgeable of MLOps Principles

Knowledgeable of ML platform technologies including Apache Airflow, Kubernetes, Dask, Ray, and MLFlow