Job Title: RH Advanced Cluster Management for Kubernetes Principle Consultant
Location: Remote
Duration 6 Months
Description
Role Overview
As a Principal Consultant, you will lead the strategic implementation of Red Hat Advanced Cluster Management (RHACM) to transform platform operations for Truist Bank. You will serve as the primary architect and delivery lead, responsible for validating complex deployment topologies and establishing a gold standard for multi-cluster observability, automated governance, and resource optimization. This is a high-autonomy role that requires the ability to deliver requirements and project scope with limited oversight.
Key Responsibilities
1. Architecture Validation & Strategy
Design Authority: Review and finalize ACM architecture, ensuring it supports diverse deployment topologies and critical Disaster Recovery (DR) requirements (Active/Passive configurations).
Infrastructure Synergy: Optimize the co-location of infrastructure management and ArgoCD to ensure a seamless "single pane of glass" for the platform.
Performance Engineering: Define storage and performance specs required to support high-throughput multi-cluster observability and alerting frameworks.
2. Observability & Performance Management
Data-Driven Insights: Lead stakeholder sessions to define and build custom Grafana dashboards that provide actionable data on capacity, network traffic, and workload scaling.
Alerting Framework: Design and implement a performant alerting framework that filters noise and provides SRE teams with discrete, actionable notifications.
Right-Sizing Initiatives: Utilize Multi-cluster Observability (MCO) and auto-scalers (HPA/VPA) to identify over-requested resources and automate application density optimization.
3. GitOps & Governance Automation
Configuration Drift Mitigation: Transition Day-2 operations to ArgoCD, ensuring all cluster configurations (RBAC, network policies, operator installs) are managed as code and automatically reverted if manual drift occurs.
Policy-as-Code: Establish a GitOps-based governance process. Create and roll out ACM Policy Sets to monitor cluster health and security compliance across the entire fleet.
Automation Integration: Integrate ACM Policies and Day-2 configurations into existing Ansible automation pipelines for full lifecycle orchestration.
Core Technical Requirements
Advanced Cluster Management (ACM): Deep expertise in implementing and configuring ACM Multi-cluster Observability (MCO), including managing Multi-cluster Hubs and Spoke clusters.
GitOps & Continuous Delivery: Proven experience using ArgoCD for Day-2 cluster configurations, operator installations, and automated drift mitigation.
Observability Stack: Expert-level capability in Grafana dashboard development and PrometheAlertmanager for creating actionable, noise-reduced alerting frameworks.
Infrastructure Automation: Strong proficiency in Ansible for automating platform deployments and managing infrastructure-as-code (IaC) workflows.
Policy & Governance: Experience defining and deploying ACM Policies and Policy Sets to enforce security, compliance, and configuration consistency across multiple clusters.
Success Criteria
Independence: Able to translate high-level business goals into a documented, validated implementation plan without day-to-day technical direction.
Customer Centricity: Strong ability to interface with various personas (SRE, Platform, Stakeholders) to extract requirements and build tailored dashboard/alerting solutions.
Additional Requirements
US-based resource; background checks will be required.
Duration: 6 months, 40hrs/wk