DevOps Engineer

Overview

On Site
$100,000 - $140,000
Full Time

Skills

Python
Kubernetes

Job Details

DevOps Engineer

Location: Cambridge, MA (In-person role)
Type: Full-time

We are a U.S.-based Generative AI startup on a mission to automate custom software creation and unlock the next industrial revolution. Our AI-powered platform autonomously generates enterprise-grade software, powered by thousands of cooperative AI agents working in concert.

This role is ideal for someone with ~5 years of DevOps/Infrastructure experience who wants to shape the future of AI-powered infrastructure.


About the Role

As a DevOps Engineer, you ll design and maintain the infrastructure that powers our AI agent ecosystem. You ll build scalable, resilient systems that support both modern applications and AI workloads. This role sits at the intersection of DevOps and emerging AI infrastructure, offering the opportunity to create systems enabling thousands of AI agents to collaborate seamlessly.

You ll work directly with our engineering teams to ensure our platform scales to enterprise customers with high performance and reliability.


What You ll Do

  • Architect and manage Kubernetes clusters for AI and application workloads

  • Build CI/CD pipelines to enable rapid, reliable deployments

  • Automate infrastructure tasks with Python and Terraform

  • Develop Helm charts for application deployments

  • Implement monitoring, alerting, and observability (Prometheus, Grafana, ELK)

  • Optimize cloud infrastructure (AWS, Google Cloud Platform, or Azure) for performance and cost

  • Ensure 99.9%+ uptime for production services

  • Collaborate with developers to enhance productivity and deployment speed

  • Support AI infrastructure: orchestration, MLOps integration, GPU optimization


Required Skills & Experience

  • 5+ years in DevOps/Infrastructure engineering

  • Strong Python skills for scripting and automation

  • Deep Kubernetes expertise (deployment, scaling, troubleshooting)

  • Helm for application packaging and management

  • CI/CD pipeline design and maintenance (GitHub Actions, GitLab CI, Jenkins, etc.)

  • Terraform and Infrastructure as Code experience

  • Linux administration and Docker/containerization expertise

  • Cloud platform experience (AWS, Azure, or Google Cloud Platform)

  • Monitoring/observability systems (Prometheus, Grafana, ELK)

  • Knowledge of microservices architecture and distributed systems


Nice-to-Have

  • Kubernetes certifications (CKA, CKAD)

  • Experience with MLOps tools (MLflow, Kubeflow, Ray)

  • GPU orchestration and optimization

  • GitOps (ArgoCD, Flux)

  • Multi-region, highly available deployments

  • Background in security/compliance (SOC2, HIPAA)

  • Contributions to open-source projects

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.