Lead DevOps/MLOps Engineer

Remote in Reston, VA, US • Posted 2 hours ago • Updated 2 hours ago
Full Time
On-site
USD $120,000.00 - 160,000.00 per year
Fitment

Dice Job Match Score™

🤯 Applying directly to the forehead...

Job Details

Skills

  • Machine Learning Operations (ML Ops)
  • FOCUS
  • Dashboard
  • Migration
  • Machine Learning (ML)
  • Image Management
  • Promotions
  • Orchestration
  • Amazon Web Services
  • Terraform
  • Docker
  • Kubernetes
  • Continuous Integration
  • Continuous Delivery
  • Amazon SageMaker
  • DevOps
  • Workflow
  • Value Engineering
  • GPU

Summary

We're looking for a strong DevOps engineer who can help scale and operationalize our infrastructure as the platform grows. This is not a pure platform-architecture role - the focus is CI/CD, infrastructure automation, deployment reliability, observability, and GPU-oriented workload scaling.
What You'll Own
  • Improve CI/CD pipelines, deployment workflows, and release reliability
  • Standardize infrastructure and deployment patterns across environments
  • Improve observability through logging, metrics, tracing, dashboards, and rollout monitoring
  • Partner closely with backend engineering on:
    • deployment strategies
    • infrastructure automation
    • environment consistency
    • migration workflows
    • possible Kubernetes migration efforts
  • Support ML-oriented infrastructure as a secondary responsibility:
    • SageMaker workloads
    • Ray clusters
    • GPU scaling patterns
    • distributed batch execution
    • autoscaling behavior
    • runtime/image management
    • artifact delivery/versioning
The Kind of Problems You'll Work On
  • Deployment safety and rollback strategies
  • Infrastructure consistency across environments
  • Release automation and environment promotion flows
  • Autoscaling and runtime stability
  • GPU workload orchestration and scaling efficiency
  • Operational tooling that reduces friction for engineering teams
Stack
  • AWS
  • Terraform
  • Docker
  • Kubernetes
  • CI/CD systems
  • SageMaker
  • Ray
  • GPU compute infrastructure
You'll Probably Do Well Here If
  • You've operated production infrastructure at meaningful scale
  • You're strong in practical DevOps execution and operational reliability
  • You care about automation, observability, and deployment safety
  • You're comfortable improving developer workflows and infrastructure tooling
  • You've worked with distributed systems or GPU-oriented workloads before
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 91123694
  • Position Id: 9b039338d52d6aa35b127fe3d1b02981
  • Posted 2 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Tysons, Virginia

Today

Full-time

McLean, Virginia

Today

Full-time

Chantilly, Virginia

Today

Full-time

USD 107,900.00 - 195,050.00 per year

Hybrid in Arlington, Virginia

Today

Full-time

USD 62,000.00 - 141,000.00 per year

Search all similar jobs