Overview
Skills
Job Details
Location: Cambridge, MA (In-person role)
Type: Full-time
We are a U.S.-based Generative AI startup on a mission to automate custom software creation and unlock the next industrial revolution. Our AI-powered platform autonomously generates enterprise-grade software, powered by thousands of cooperative AI agents working in concert.
This role is ideal for someone with ~5 years of DevOps/Infrastructure experience who wants to shape the future of AI-powered infrastructure.
About the RoleAs a DevOps Engineer, you ll design and maintain the infrastructure that powers our AI agent ecosystem. You ll build scalable, resilient systems that support both modern applications and AI workloads. This role sits at the intersection of DevOps and emerging AI infrastructure, offering the opportunity to create systems enabling thousands of AI agents to collaborate seamlessly.
You ll work directly with our engineering teams to ensure our platform scales to enterprise customers with high performance and reliability.
What You ll Do-
Architect and manage Kubernetes clusters for AI and application workloads
-
Build CI/CD pipelines to enable rapid, reliable deployments
-
Automate infrastructure tasks with Python and Terraform
-
Develop Helm charts for application deployments
-
Implement monitoring, alerting, and observability (Prometheus, Grafana, ELK)
-
Optimize cloud infrastructure (AWS, Google Cloud Platform, or Azure) for performance and cost
-
Ensure 99.9%+ uptime for production services
-
Collaborate with developers to enhance productivity and deployment speed
-
Support AI infrastructure: orchestration, MLOps integration, GPU optimization
-
5+ years in DevOps/Infrastructure engineering
-
Strong Python skills for scripting and automation
-
Deep Kubernetes expertise (deployment, scaling, troubleshooting)
-
Helm for application packaging and management
-
CI/CD pipeline design and maintenance (GitHub Actions, GitLab CI, Jenkins, etc.)
-
Terraform and Infrastructure as Code experience
-
Linux administration and Docker/containerization expertise
-
Cloud platform experience (AWS, Azure, or Google Cloud Platform)
-
Monitoring/observability systems (Prometheus, Grafana, ELK)
-
Knowledge of microservices architecture and distributed systems
-
Kubernetes certifications (CKA, CKAD)
-
Experience with MLOps tools (MLflow, Kubeflow, Ray)
-
GPU orchestration and optimization
-
GitOps (ArgoCD, Flux)
-
Multi-region, highly available deployments
-
Background in security/compliance (SOC2, HIPAA)
-
Contributions to open-source projects