Site Reliability Engineer / Coding Required / Argo or Monitoring

Overview

On Site
130k - 160k
Full Time

Skills

Machine Learning Operations (ML Ops)
e-commerce
IaaS
Machine Learning (ML)
Health insurance
Software development
Transformation
Management
Amazon Web Services
Kubernetes
Training
Grafana
Software deployment
Cloud computing
Servers
Storage
Middleware
Network
Design
Automation
Orchestration
Collaboration
SAP BASIS

Job Details

Job Description This Fortune 500 company in the Chicago area, a top 10 North American e-commerce player focused on industrial supplies, underwent a digital transformation around 2018 under a new CTO, enabling growth during the pandemic and retaining tech talent due to its competitive and challenging environment.

This SRE/Cloud Infrastructure Engineer role involves managing AWS-hosted Kubernetes platforms engineered for machine learning workloads like training, experimentation, and serving. Responsibilities include ensuring a robust and scalable infrastructure for advanced ML workloads, implementing and managing monitoring tools (Grafana, Loki, Prometheus, Thanos), and maintaining continuous deployment using GitOps practices with ArgoCD and Flux.

The engineer will build, test, configure, tune and support the Kubernetes infrastructure in the cloud, encompassing servers, storage, middleware, network, and client technologies. They will design and implement automation solutions across multiple platforms, recommending improvements for automated tools and identifying opportunities for increased orchestration adoption. The individual will work in a large, complex 24/7 e-commerce environment, gaining experience with various on-premises and cloud-based applications, as part of the Machine Learning Operations team supporting the ML platform. Required Skills & Experience
  • 5+ years of professional experience
  • In-depth Kubernetes experience
  • ArgoCD
  • Monitoring tools like Grafana or Prometheus
Desired Skills & Experience
  • Experience supporting ML platforms
  • At least 2 years supporting GitOps
  • Flux
What You Will Be Doing Daily Responsibilities
  • 70% Hands On
  • 30% Team Collaboration
The Offer
  • Bonus eligible
You will receive the following benefits:
  • Medical Insurance
  • Dental Benefits
  • Vision Benefits
  • Paid Time Off (PTO)
  • 401(k)

Applicants must be currently authorized to work in the US on a full-time basis now and in the future.

About Motion Recruitment Partners, LLC