Position Summary
The Principal Platform Scripting & Infrastructure Engineer is a senior-level hands-on engineer responsible for designing and building the automation layer that powers the H100 platform.
This includes complex bootstrap scripts, infrastructure provisioning pipelines, Kustomize overlays, Terraform modules, and operational tooling.
You will own the scripting and automation backbone that enables repeatable, auditable infrastructure delivery across Google Cloud Platform, GKE, and the CI/CD runner fleet.
Key Responsibilities
- Design and maintain bootstrap automation scripts (Bash, PowerShell) for end-to-end Google Cloud Platform project, GKE cluster, and KCC platform provisioning (70KB+ bootstrap scripts)
- Build and maintain Terraform modules for Google Cloud Platform infrastructure: bootstrap project, GKE clusters, NCC spokes, Palo Alto NGFW, Wiz integration
- Develop Kustomize overlays and KCC manifests for GitOps-driven infrastructure delivery via Config Sync
- Create and maintain preflight validation scripts for infrastructure readiness checks (GKE, networking, IAM, Config Sync)
- Build CI/CD pipeline automation: GitHub Actions workflows for Terraform plan/apply, KCC manifest validation (yamllint, kustomize build, OPA checks)
- Develop operational scripts for emergency cleanup, immutable resource fixes, secret rotation, and disaster recovery procedures
- Automate runner image builds for Linux, Windows, Android, and iOS GitHub Actions self-hosted runners
- Build and maintain JFrog Artifactory integration scripts for artifact lifecycle management
- Create monitoring and alerting automation: dashboards-as-code, alert policy management, SLO configuration
- Develop infrastructure testing and validation frameworks to ensure changes are safe before production deployment
- Document all automation with clear README files, inline comments, and runbooks
- Collaborate with the Principal Cloud Engineer on architecture decisions and implementation
Required Qualifications & Skills
- 10+ years in infrastructure engineering with deep scripting and automation focus
- Expert-level Bash scripting (complex multi-stage provisioning scripts, error handling, idempotency)
- Strong PowerShell skills for Windows-based automation and runner image management
- Advanced Python for infrastructure tooling, API integrations, and data processing
- Strong Terraform (HCL): module development, state management, provider configuration, CI/CD integration
- Kubernetes manifests: Kustomize, Helm charts, KCC resources, RBAC, Config Sync
- Google Cloud Platform SDK/CLI proficiency: gcloud, gsutil, kubectl, kustomize, terraform CLI
- GitHub Actions workflow authoring: composite actions, reusable workflows, matrix strategies, self-hosted runners
- Experience with GKE cluster lifecycle: provisioning, upgrades, node pool management, Config Sync setup
- Familiarity with Packer or similar tools for building custom VM/container images
- Strong Linux systems administration (Ubuntu/Debian, systemd, networking, storage)
Preferred / Nice-to-Have
- Experience with KCC (Config Connector) as infrastructure-as-code for Google Cloud Platform resources
- Go programming for CLI tooling or Kubernetes operators
- Experience with ARC (Actions Runner Controller) deployment and configuration on GKE
- Familiarity with OCI artifact registries and Config Sync OCI source type
- Experience with External Secrets Operator (ESO) and Google Cloud Platform Secret Manager integration
Technology Stack
Scripting: Bash (primary), PowerShell, Python, Go
IaC: Terraform, Kustomize, Helm, KCC manifests (YAML)
Cloud: Google Cloud Platform (GKE, Compute, VPC, IAM, KMS, Secret Manager, Artifact Registry, NCC)
CI/CD: GitHub Actions, Config Sync, JFrog Artifactory, OCI registries
Containers: Docker, GKE, ARC, ESO, containerd
Testing: OPA/Rego, yamllint, kustomize build validation, Terraform plan checks
OS: Linux (Ubuntu/Debian), Windows Server, macOS
Experience
- 10+ years infrastructure/scripting; 5+ years cloud automation; 3+ years Google Cloud Platform/GKE.