Overview
Skills
Job Details
Job Title: DevOps Engineer (Google Cloud Platform & Vertex AI)
Experience: 7+ Years
Job Overview
We are seeking a highly skilled DevOps Engineer with expertise in Google Cloud Platform (Google Cloud Platform) and Vertex AI to support scalable, secure, and automated ML/AI pipelines. The ideal candidate will collaborate with data scientists, ML engineers, and cloud architects to design and implement CI/CD pipelines, infrastructure automation, and monitoring solutions for AI/ML workloads.
Key Responsibilities
Design, implement, and maintain CI/CD pipelines for ML models and cloud-native applications on Google Cloud Platform.
Automate infrastructure provisioning using Terraform / Deployment Manager / Ansible.
Deploy, monitor, and optimize Vertex AI pipelines for training, testing, and serving ML models.
Implement MLOps best practices for reproducibility, scalability, and governance.
Work with Cloud Build, Cloud Run, Kubernetes (GKE), and Artifact Registry for automated deployments.
Ensure security, compliance, and cost optimization across Google Cloud Platform projects.
Collaborate with Data Engineers & ML Engineers to productionize AI/ML solutions.
Build and maintain monitoring & logging solutions (Stackdriver, Prometheus, Grafana).
Troubleshoot infrastructure and deployment issues in distributed ML workflows.
Required Skills
7+ years in DevOps / Cloud Engineering.
Strong expertise in Google Cloud Platform (Google Cloud Platform) (IAM, Networking, Compute, Storage, Pub/Sub).
Hands-on experience with Vertex AI (training, pipelines, endpoints, feature store).
Proficiency in Terraform, Ansible, or Deployment Manager.
Strong background with Docker, Kubernetes (GKE), and Helm.
Experience in CI/CD tools: Cloud Build, Jenkins, GitLab CI/CD, or ArgoCD.
Scripting experience with Python / Bash / Go.
Knowledge of MLOps frameworks (Kubeflow, MLflow, TFX) is a plus.
Familiarity with logging, monitoring, and alerting tools.
Preferred Qualifications
Google Professional Cloud DevOps Engineer or Cloud Architect Certification.
Prior experience supporting ML model lifecycle from experimentation to production.
Understanding of data pipelines using BigQuery, Dataflow, or Dataproc.
Knowledge of security best practices in AI/ML environments.