Role: Sr. DevOps Engineer
Location: Palo Alto, CA 94301 (Hybrid 2 days/week)
Duration: 3+ month C2H
About the Role
We are looking for experienced DevOps Engineers to support production operations for OpusClip. This is a hands-on role requiring immediate contribution with minimal onboarding. Candidates must be self-sufficient and comfortable working in a complex and fast-paced environment.
Key Responsibilities
1. IaC, Secrets, IAM & Security
Own the security foundation: service account lifecycle, secrets management, encryption key management, workload identity, and compliance tooling. Drive Infrastructure as Code adoption and establish GitOps workflows for access control changes.
2. Networking, DNS & Edge Infrastructure
Own cloud networking, firewall rules, static IPs, NAT, DNS, and a fleet of proxy and bastion hosts. Manage load balancing, SSL certificates, CDN, and synthetic monitoring. Identify and eliminate unused network resources.
3. Databases, Storage & Messaging
Own relational databases, in-memory caches, object storage, and event streaming infrastructure. Ensure backup/restore procedures, capacity planning, and data layer reliability for mission-critical workloads.
4. Kubernetes, Compute, CI/CD & Observability
Own large-scale Kubernetes clusters, standalone compute fleets, CI/CD pipelines, artifact management, and the full observability stack (monitors, dashboards, SLOs). Drive cost optimization and monitoring hygiene.
What You'll Do
Operate and improve production cloud infrastructure supporting a high-scale video platform
Drive Infrastructure as Code - codify existing resources, establish GitOps workflows, implement policy-as-code
Participate in on-call rotation on a regular cycle with the team
Triage and resolve infrastructure incidents: capacity, OOM, network, database saturation, node failures, certificate expiry, CI/CD failures
Optimize costs, identify unused resources, right-size compute, manage vendor spend
Collaborate with engineering teams on shared concerns like API reliability and deploy pipelines
Required:
- 3+ years managing production cloud infrastructure (major cloud provider - Google Cloud Platform, AWS, or Azure)
- Strong experience with Kubernetes at scale (node pool management, autoscaling, troubleshooting)
- Proficiency with Infrastructure as Code (Terraform or equivalent)
- Experience with observability platforms (Datadog, Grafana, or similar - monitors, dashboards, SLOs, APM)
- Solid understanding of networking (VPCs, firewall rules, NAT, DNS, load balancing, TLS)
- Experience with CI/CD pipelines and container registries
- Comfort with on-call responsibilities and incident resp
Preferred:
- Experience with managed relational databases (PostgreSQL or similar) at scale Familiarity with workflow orchestration systems
- Experience with in-memory data stores (Redis or similar) in production
- Background in security and compliance (least-privilege IAM, secrets rotation, audit logging)
- Experience with policy-as-code frameworks
- Familiarity with cloud cost optimization tooling
- Temporal workflow operations
- AlloyDB or PostgreSQL experience
- Python or Go scripting
- Experience with worker/queue based async service architectures
- Mandarin proficiency is a HUGE plus