Overview
Skills
Job Details
Role summary:
The Cloud Architect & Subject Matter Expert (SME) is a critical senior technical leader responsible for overseeing and governing the design, operational stability, and modernization of our hybrid and multi-cloud environments (Google Cloud Platform, Azure, OCI). This role ensures 24x7 operational excellence for critical healthcare services, drives proactive Level 2 and Level 3 Incident/Problem Management, and guarantees strict adherence to cost optimization, automation, and stringent compliance standards like HIPAA and SOC2. The Architect Lead serves as the highest-level technical escalation point and provides essential mentorship to the Cloud Operations engineering teams
Key Responsibilities:
Serve as the L3/L4 technical escalation point for complex, high-severity (P1/P2) incidents impacting cloud infrastructure and critical healthcare applications.
Lead Deep Dive Root Cause Analysis (RCA) for major recurring incidents, implementing strategic preventative measures to eliminate repeat failures
Directly manage the cloud operations team's day-to-day activities for incident, Problem, and Change Management, ensuring strict adherence to IT L processes, SLAS.
Define, document, and govern the cloud operations architecture for high availability (HA), disaster recovery (DR), performance, and security across all cloud tenants (Google Cloud Platform, Azure, OCI).
Specifically design and manage the operational lifecycle of container orchestration platforms, including Google
Kubernetes Engine (GKE) clusters.
Ensure all cloud deployments and operational practices are rigorously aligned with HIPAA, and SOC2 compliance and security policies.
Oversee and drive vulnerability management, security patching, and compliance audit readiness (including vendor coordination)
Establish and maintain comprehensive operational standards, Standard Operating Procedures (SOPs), runber and robust governance models for the Cloud Operations team.
Accelerate automation initiatives using Infrastructure as Code (laC) and configuration management (Terrafor
Ansible, PowerShell) to reduce manual toil and improve operational consistency.
Implement and mature observability and monitoring strategies using relevant tooling to ensure proactive iden fication of performance degradation and capacity issues.
Develop and govern the architecture for scalable, secure multi-cloud landing zones and policy frameworks (e.g
Azure Policy, Google Cloud Platform Organization Policy).
Define and champion FinOps strategies and cost optimization best practices across all cloud resources, driving transparency and accountability for cloud spending
Act as the SME and key liaison for resolving complex cross-functional issues with DB, Application, Security, and Network teams.
Guide and mentor Cloud Administrators and platform SMEs, fostering a culture of technical excellence and contin-wous improvement
Qualifications & Technical Skills:
Experience: 10-15 years of progressive experience in infrastructure architecture, engineering, and high-volume, critical cloud operations support
Multi-Cloud Platforms: Deep and proven expertise in designing, implementing, and operating solutions within Google Cloud Platform (GP) (preferred), Microsoft Azure and Oracle Cloud Infrastructure (OCI) ecosystems:
Containerization: Expert knowledge of container technologies, including deployment, networking, security, and operational management of Google Kubernetes Engine (GE) and related services (e.g., Anthos, Cloud Run).
Healtncare Compliance: Demonstrated expertise in maintaining HIPAA compliance and security within a public multi-cloud environment.