Database Site Reliability Engineer (DB SRE) Google Cloud | CloudSQL | Spanner
About the Role
We re looking for a Database Site Reliability Engineer (DB SRE) with hands-on experience in Google Cloud Platform (Google Cloud Platform), particularly with CloudSQL and Spanner, to join our infrastructure team. You ll be responsible for ensuring the reliability, scalability, and performance of our cloud-native database systems that power mission-critical applications.
Key Responsibilities
- Design, implement, and maintain highly available and scalable database infrastructure using CloudSQL and Spanner on Google Cloud Platform.
- Monitor database performance, availability, and reliability using modern observability tools.
- Automate operational tasks such as backups, failovers, schema migrations, and patching.
- Collaborate with engineering teams to optimize queries, schema design, and data access patterns.
- Implement disaster recovery strategies and ensure data integrity across environments.
- Develop and maintain infrastructure-as-code (IaC) for database provisioning and configuration.
- Participate in on-call rotations and incident response for database-related issues.
- Drive continuous improvement in database reliability, cost efficiency, and performance.
Required Skills & Qualifications
- 8+ years of experience as an SRE, DevOps Engineer, or Database Engineer.
- Strong experience with Google Cloud Platform (Google Cloud Platform) services, especially CloudSQL (PostgreSQL/MySQL) and Spanner.
- Proficiency in SQL, database performance tuning, and query optimization.
- Experience with monitoring tools like Prometheus, Grafana, Stackdriver, or equivalent.
- Familiarity with Terraform, Ansible, or other IaC tools.
- Solid understanding of SRE principles, including SLIs/SLOs, incident management, and postmortems.
- Strong scripting skills in Python, Bash, or Go.
Preferred Qualifications
- Experience with multi-region deployments and high availability architectures.
- Knowledge of Kubernetes, Cloud Run, or GKE for database-related workloads.
- Exposure to CI/CD pipelines and GitOps workflows.
DB SRE, Site Reliability Engineer, CloudSQL, Spanner, Google Cloud Platform, Google Cloud Platform, Database Reliability, PostgreSQL, MySQL, Terraform, Infrastructure as Code, Monitoring, Observability, Prometheus, Grafana, Stackdriver, Python, Bash, Go, High Availability, Disaster Recovery, CI/CD, GitOps, Kubernetes, GKE, Cloud Run