Job Title: Senior DevOps / Site Reliability Engineer (Terraform)
Location: New York, NY
Role Overview
We are seeking a highly skilled DevOps / Site Reliability Engineer (SRE) to join our team in New York. The ideal candidate will have strong hands-on experience in Terraform, cloud infrastructure automation, CI/CD pipelines, and Kubernetes, with a proven ability to design and maintain highly scalable, reliable, and secure production systems.
This role requires a strong engineering mindset, deep understanding of infrastructure as code (IaC), and the ability to support mission-critical applications in a fast-paced environment.
Key Responsibilities
Design, implement, and manage cloud infrastructure using Terraform (Infrastructure as Code)
Build and maintain scalable and resilient CI/CD pipelines
Deploy and manage containerized applications using Kubernetes
Ensure high availability, performance, and reliability of production systems
Implement monitoring, logging, and alerting solutions
Improve system reliability through automation and performance tuning
Partner with development teams to improve deployment processes and developer productivity
Implement security best practices within infrastructure and pipelines
Troubleshoot and resolve production incidents and perform root cause analysis
Required Skills & Experience
10+ years of experience in DevOps / SRE roles
Strong hands-on experience with Terraform
Experience with cloud platforms (AWS preferred, Azure/Google Cloud Platform acceptable)
Strong experience with Kubernetes and Docker
CI/CD tools: Jenkins, GitHub Actions, GitLab CI, or similar
Experience with monitoring tools (Prometheus, Grafana, CloudWatch, Datadog, etc.)
Strong scripting skills (Python, Bash, or similar)
Experience working in high-availability, distributed systems environments
Strong troubleshooting and performance optimization skills