Infrastructure Manager: III (Senior)

  • Columbus, OH
  • Posted 8 hours ago | Updated 8 hours ago

Overview

On Site
USD 58.00 - 63.00 per hour
Contract - W2
Contract - Independent

Skills

Coaching
Machine Learning Operations (ML Ops)
Production Support
FOCUS
Incident Management
Root Cause Analysis
Continuous Integration
Continuous Delivery
MEAN Stack
Reliability Engineering
Reporting
Team Leadership
Onboarding
Recruiting
Partnership
High Availability
Python
Scripting
Bash
Debugging
Amazon Web Services
Amazon S3
Docker
Kubernetes
Offshoring
DevSecOps
Large Language Models (LLMs)
Vertex
Financial Services
Terraform
Collaboration
Microsoft Azure
DevOps
Management
ServiceNow
Change Control
Machine Learning (ML)
Leadership
Conflict Resolution
Problem Solving
Mentorship
Finance
Accounting
Marketing
Legal
Customer Support
Online Training
Artificial Intelligence
Insurance
.NET

Job Details

Description

MLOps Production Support & DevSecOps Manager

Contract-to-Hire | Hybrid Role | Columbus, OH

Are you a technically engaged leader who embraces both managerial oversight and hands-on development? We are looking for someoen who excels in rolling up their sleeves to tackle technical challenges while coaching and mentoring a skilled team of engineers.

About The Role:

As our MLOps Production Support & DevSecOps Manager, you will drive the reliability and security of machine learning systems across production environments. Your mission will focus on incident management, embedding DevSecOps practices, ensuring system reliability, and leading a team of engineers toward success-all while staying close to the technical challenges that drive meaningful improvements.

We're in search of someone who thrives amid complex machine-learning operational challenges, enjoys incident management at 2 a.m. as much as leading with metrics, and has a passion for fostering team collaboration.

Key Responsibilities:
  • Incident Command & Site Reliability Engineering (SRE): Lead incident response around critical production issues in ML/LLM pipelines and batch operations. Drive root-cause analysis and publish actionable post-mortems to ensure future event prevention (Source: US Demand for Skilled Talent Q1 2025).
  • DevSecOps Automation: Develop and amend CI/CD pipelines, Helm charts, and Python utilities. Embed DevSecOps practices like vulnerability scans and automated rollback logic.
  • Reliability Governance: Track metrics like MTTR (Mean Time to Resolution) to improve system reliability and report metrics-driven trends to leadership (Source: US Demand for Skilled Talent Q1 2025).
  • Team Leadership: Coach and mentor an 8-10 member engineering team (on-site and offshore), fostering an SRE mindset while actively participating in onboarding and recruitment activities.
  • Enterprise Partnerships: Collaborate with cross-functional teams like Solution Engineering, Platform Enablement, and Architecture to harden AWS deployments, strengthen HA/DR environments, and resolve security vulnerabilities.


Requirements

Requirements - Must-Have Qualifications:
  • Proven hands-on expertise responding to machine learning (ML) production incidents, including debugging Python code under pressure.
  • Proficiency in Python scripting and Bash, enabling you to debug pipelines and develop quick fixes.
  • Solid understanding and practical experience in AWS services (IAM roles, EKS, S3, CloudWatch, etc.) (Source: US Demand for Skilled Talent Q1 2025).
  • Strong knowledge of Docker and Kubernetes for container deployment and management.
  • Demonstrated success in converting severe incidents into permanent improvements with examples to share.
  • Experience leading, mentoring, and managing onshore/offshore blended teams.
  • Familiarity with DevSecOps methodologies, including static scans (Snyk/Trivy), container runtime controls, and SBOM generation.

Desirable Skills - Nice to Have:
  • Hands-on familiarity with Large Language Model (LLM) operations (e.g., Bedrock, Vertex AI).
  • Experience in financial services or other regulated industries.
  • Expertise in Terraform and Helm chart authoring.

How We Operate:
  • On-Call:This is a hands-on leadership role-actively participate in P1/P2 incident responses for 247 production health.
  • Culture:Every incident is seen as an investment in improvement. We build systems to prevent recurrence.
  • Collaboration Tools:Utilize Azure DevOps for work management and triage, alongside ServiceNow for change control.
  • Location:This role follows a hybrid model; team members are on-site in Columbus three days a week.

Why Join Us?

If you thrive on making machine learning systems secure and reliable-while fostering personal growth in your team-this is your opportunity to lead meaningful improvements in a cutting-edge space.

Apply now to accelerate your career with a hands-on leadership role that combines technical problem-solving with team mentorship.

Technology Doesn't Change the World, People Do.

Robert Half is the world's first and largest specialized talent solutions firm that connects highly qualified job seekers to opportunities at great companies. We offer contract, temporary and permanent placement solutions for finance and accounting, technology, marketing and creative, legal, and administrative and customer support roles.

Robert Half works to put you in the best position to succeed. We provide access to top jobs, competitive compensation and benefits, and free online training. Stay on top of every opportunity - whenever you choose - even on the go. Download the Robert Half app and get 1-tap apply, notifications of AI-matched jobs, and much more.

All applicants applying for U.S. job openings must be legally authorized to work in the United States. Benefits are available to contract/temporary professionals, including medical, vision, dental, and life and disability insurance. Hired contract/temporary professionals are also eligible to enroll in our company 401(k) plan. Visit roberthalf.gobenefits.net for more information.

2025 Robert Half. An Equal Opportunity Employer. M/F/Disability/Veterans. By clicking "Apply Now," you're agreeing to Robert Half's Terms of Use.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Robert Half