Role Summary
We are seeking a Principal DevOps Architect to design and implement a unified automation plane across our hybrid-cloud estate. You will lead the creation of multi-cloud Landing Zones and build the automation tissue that connects on-premise VMWare/OpenShift environments with public cloud providers (AWS/Azure/Google Cloud Platform). Your mission is to eliminate manual toil through SRE-driven automation, ensuring our infrastructure is as code-driven, observable, and resilient as the applications it hosts.
Key Responsibilities
1. Hybrid & Multi-Cloud Landing Zones
- Unified Governance: Architect and deploy automated Cloud Landing Zones with integrated security guardrails, VPC/VNet topologies, and IAM synchronization across multiple regions.
- On-Premise Integration: Design "Cloud-Adjacent" architectures that allow on-premise workloads to burst into the cloud or maintain low-latency connections to local storage/databases.
- Infrastructure as Code (IaC): Develop modular, version-controlled Terraform or OpenTofu providers to manage resources across AWS, Azure, and private data centers.
2. Engineering & Automation
- Python for Tooling: Build custom automation frameworks and "Glue Code" in Python to bridge gaps between vendor APIs and internal developer portals.
- GitOps & CI/CD: Implement GitOps workflows (using ArgoCD or Flux) to ensure the "Source of Truth" in Git is automatically reflected in the live environment.
- State Management: Design robust backend state management for multi-cloud IaC to prevent configuration drift and ensure environment parity.
3. SRE & Operational Excellence
- Toil Elimination: Identify repetitive manual tasks in the migration lifecycle and replace them with automated self-service workflows.
- Observability Architecture: Design a unified monitoring plane (Prometheus, Grafana, Datadog) that provides "Single Pane of Glass" visibility across hybrid environments.
- Error Budgeting: Partner with engineering teams to define SLIs/SLOs and implement automated "Circuit Breakers" for failing deployments.
Required Skills & Qualifications
Experience
12+ years in Infrastructure/DevOps; 5+ years in a Lead Architect capacity.
IaC Mastery
Expert-level Terraform (module development, state management, Terragrunt).
Scripting
Proficiency in Python for complex automation and Bash for systems-level tasks.
Cloud Platforms
Deep hands-on experience with at least two: AWS, Azure, or Google Cloud Platform.
On-Prem/Hybrid
Experience with VMWare, Nutanix, or Bare-Metal automation (Ansible/Packer).
Containerization
Advanced Kubernetes (EKS/AKS/OpenShift) including Service Mesh (Istio/Linkerd).
VCS
Mastery of Git (Branching strategies, Hooks, and CI/CD integration).