Need locals or someone who can relocate to Chicago from day one
Role: Cloud Architect Azure Platform
Experience: 12+ Years of experience (Hands on)
Location: Chicago (Mandatory: Work from Clients office 4 days a week)
Required Skills & Experience
- CloudNative Architecture: Design and operate resilient, scalable Azure cloudnative platforms aligned to enterprise standards and RUN SLAs
- DevSecOps & GitOps: Implement secure CI/CD and GitOps pipelines with builtin security, policy enforcement, and automated controls
- Cloud Landing Zone & Policy Management: Operate and govern Azure Landing Zones using Azure Policy, RBAC, guardrails, and compliance automation
- Platform & COE Tooling: Build and support reusable COE accelerators, golden paths, templates, and automation frameworks
- AIOps & Observability: Enable proactive monitoring, logging, alerting, and AIOpsdriven insights for platform reliability and incident reduction
- FinOps: Embed cost governance, tagging, budgets, and optimization practices into platform operations
- Cloud Architecture (RUNfocused): Translate clientapproved architectures into operable, supportable, and compliant Azure platforms
- Containers & Kubernetes: Design, deploy, and operate container platforms using Kubernetes, AKS, Docker, and Helm
- Infrastructure as Code: Provision and manage Azure infrastructure using Terraform and automated pipelines
- API & Integration Platforms: Design and support secure APIs and integrations using Azure API Management (APIM)
- Event & Streaming Platforms: Support cloudnative messaging and streaming solutions using Kafka and managed services
- Scripting & Automation: Develop operational automation using Python and platform SDKs
- Agile & ITSM Alignment: Operate within Agile delivery models while supporting ITSM, incident, change, and problem management processes
Certifications:
- Microsoft Certified: Azure Solutions Architect Expert (AZ305) - Required
- Microsoft Certified: Azure Administrator Associate (AZ104) Required
- AZ400 (DevOps Engineer Expert)
- AZ500 (Azure Security Engineer Associate)
- ITIL 4 Foundation
- Terraform Associate
Responsibilities:
Azure Platform RUN Ownership
- Act as L3/L4 escalation point for Azure platform incidents across IaaS, PaaS, landing zones, and Terraformbased deployments.
- Lead rootcause analysis (RCA) for P1/P2 incidents and drive permanent fixes through automation and design improvements.
- Ensure platform services meet availability, reliability, and performance SLAs.
Landing Zone & Governance Operations
- Operate and govern Azure Landing Zones, including RBAC models, Azure Policy, network/security baselines, and compliance monitoring.
- Detect and remediate configuration drift using policyascode and IaC controls.
- Maintain operational RACI alignment across Platform, Security, FinOps, and Network teams.
InfrastructureasCode & Automation
- Design, maintain, and review Terraform modules, CI/CD pipelines, and reusable golden paths used in RUN operations.
- Ensure provisioning, changes, and decommissioning follow approved automated pipelines.
- Perform seniorlevel IaC and pipeline conformance reviews.
Service Requests & Change Governance
- Provide architectural oversight for service requests, enhancements, and onboarding of new Azure services.
- Support cloud change governance processes and validate LowLevel Designs (LLDs) for operational readiness.
- Ensure changes are safe, auditable, and compliant within the managed services model.
Security & Compliance Support
- Implement and operate Azure security controls (Azure Policy, RBAC, Conditional Access, Key Vault).
- Support security incidents, audit evidence requests, and remediation of compliance findings in coordination with Security teams.
FinOps & Continuous Improvement
- Partner with FinOps teams to enforce cost guardrails, tagging standards, and optimization actions.
- Drive continuous service improvement through automation, reliability engineering, and cost efficiency initiatives.
Contineous Service Improvement
- AutomationLed Optimization: Continuously reduce manual operational effort by automating Azure platform tasks using Python, Azure SDKs, and REST APIs
- SelfHealing Operations: Implement Agentic AI driven remediation workflows to autodetect, diagnose, and resolve recurring platform issues
- Proactive Incident Reduction: Leverage AIOps and AIassisted analytics to identify patterns, predict failures, and prevent incidents before impact
- IaC Drift & Compliance Improvement: Use automation to detect and remediate Terraform drift, configuration noncompliance, and policy violations
- Operational Observability Enhancement: Improve platform reliability through continuous tuning of logging, metrics, alerts, and telemetry across Azure services
- Agentic Runbook Automation: Convert manual runbooks into agentdriven workflows for repeatable, zerotouch execution of common operational tasks
- Cost & Performance Optimization: Drive CSI through FinOps automation, including rightsizing, scheduling, and cost anomaly detection
- APIFirst Improvements: Enhance service responsiveness by integrating Azure services using SDKbased and eventdriven automation
- Intelligent Change Execution: Apply AIassisted impact analysis and guardrails to reduce changerelated incidents and improve change success rates
- Continuous Feedback Loop: Use operational data, AI insights, and platform KPIs to prioritize CSI backlog and deliver measurable improvements sprintoversprint