Overview
Skills
Job Details
DevOps Engineer
Location: Cambridge, MA (In-person role)
Type: Full-time
We are a U.S.-based Generative AI startup on a mission to automate custom software creation and unlock the next industrial revolution. Our AI-powered platform autonomously generates enterprise-grade software, powered by thousands of cooperative AI agents working in concert.
This role is ideal for someone with ~5 years of DevOps/Infrastructure experience who wants to shape the future of AI-powered infrastructure.
About the Role
As a DevOps Engineer, you ll design and maintain the infrastructure that powers our AI agent ecosystem. You ll build scalable, resilient systems that support both modern applications and AI workloads. This role sits at the intersection of DevOps and emerging AI infrastructure, offering the opportunity to create systems enabling thousands of AI agents to collaborate seamlessly.
You ll work directly with our engineering teams to ensure our platform scales to enterprise customers with high performance and reliability.
What You ll Do
Architect and manage Kubernetes clusters for AI and application workloads
Build CI/CD pipelines to enable rapid, reliable deployments
Automate infrastructure tasks with Python and Terraform
Develop Helm charts for application deployments
Implement monitoring, alerting, and observability (Prometheus, Grafana, ELK)
Optimize cloud infrastructure (AWS, Google Cloud Platform, or Azure) for performance and cost
Ensure 99.9%+ uptime for production services
Collaborate with developers to enhance productivity and deployment speed
Support AI infrastructure: orchestration, MLOps integration, GPU optimization
Required Skills & Experience
5+ years in DevOps/Infrastructure engineering
Strong Python skills for scripting and automation
Deep Kubernetes expertise (deployment, scaling, troubleshooting)
Helm for application packaging and management
CI/CD pipeline design and maintenance (GitHub Actions, GitLab CI, Jenkins, etc.)
Terraform and Infrastructure as Code experience
Linux administration and Docker/containerization expertise
Cloud platform experience (AWS, Azure, or Google Cloud Platform)
Monitoring/observability systems (Prometheus, Grafana, ELK)
Knowledge of microservices architecture and distributed systems
Nice-to-Have
Kubernetes certifications (CKA, CKAD)
Experience with MLOps tools (MLflow, Kubeflow, Ray)
GPU orchestration and optimization
GitOps (ArgoCD, Flux)
Multi-region, highly available deployments
Background in security/compliance (SOC2, HIPAA)
Contributions to open-source projects