Job Tittle: Senior RabbitMQ Hands on Engineer / SME
Location: Remote
Job Type: Contract
Role Summary
We need a Senior RabbitMQ Engineer (SME), located in USA (eastern side preferred) to support a customer engagement. This is a hands-on, staff-augmentation role that combines architecture, technical leadership, and execution. The resource will act as the single-threaded RabbitMQ authority assessing the current platform, stabilizing it quickly, and guiding the customer toward a supported, resilient RabbitMQ posture in Azure (VMs and/or AKS). The engagement would be for 6 months.
Primary outcome: Stabilize RabbitMQ by April / early May and deliver a clear modernization recommendation (AKS vs VM-based vs SaaS) with a practical execution path.
Engagement Context
Responsibilities
1) Assessment & Stabilization (Immediate)
- Perform current-state review: topology, broker configuration, policies, queue types, client connection patterns, resource thresholds.
- Identify reliability/performance risks and execute prioritized remediation.
- Establish good operational standards: monitoring, alerting, runbooks, on-call readiness.
2) Architecture & Technical Direction
- Define target-state options and tradeoffs: Azure VMs vs AKS vs SaaS.
- Provide an upgrade strategy to a supported RabbitMQ version (sequencing, rollout, rollback).
- Recommend best practices for multi-tenant RabbitMQ (vhosts, permissions, policy boundaries).
3) DR / Resiliency Improvements
- Diagnose why DR isn t working; propose and implement pragmatic recovery posture aligned to business requirements.
- Validate failover/recovery procedures through testing and documentation.
4) Platform Enablement & Standardization
- Improve maintainability of Ansible-based configuration and reduce bespoke patterns.
- Create/tune reusable gold standard patterns for vhost provisioning, policies, and operational controls.
- Coach customer engineers; transfer knowledge and operational ownership.
Required Skills & Experience (Must-Have)
- 7 10+ years in distributed systems / messaging platforms; expert-level RabbitMQ in production.
Strong experience with:
- clustering and HA patterns (quorum queues / mirrored strategies where applicable)
- performance tuning (memory watermarks, disk alarms, flow control, channel/connection behaviors)
- upgrades and lifecycle management (zero/minimal downtime approaches, rollback planning)
- incident triage and root cause analysis in high-throughput environments
- Azure operational experience (networking, VM patterns; AKS familiarity strongly preferred)
- Hands-on automation experience (Ansible or similar IaC/config management)
- Ability to operate as a technical lead: clear decision-making, documentation, stakeholder comms.
Preferred / Nice-to-Have
- Designing DR for messaging in cloud (active/passive and/or multi-region approaches)
- Experience integrating messaging with enterprise integration stacks (e.g., BizTalk patterns)
Deliverables
- Current-state assessment + prioritized stabilization plan
- Implemented stability improvements (config/tuning/operational guardrails)
- Supported version upgrade plan (and execution, if in-scope)
- DR gap analysis + implemented/tested recovery procedures
- AKS vs VM vs SaaS recommendation with risk/effort tradeoffs
- Standardized configuration approach for vhosts/policies + documentation/runbooks
Candidate Profile
- Player/Coach : can architect and still get hands dirty fast.
- Strong executive communication: can explain tradeoffs and risk in plain English.
- Bias for practical outcomes: stabilize first, modernize second, document always.