Cloud / Platform Operations Manager @ Atlanta / Roswell, GA (Onsite)

Overview

On Site

$50 - $60

Accepts corp to corp applications

Contract - W2

Contract - Independent

Contract - 6 Month(s)

Skills

cloud operations

AWS

Azure

multi-account

Datadog

Dynatrace

Splunk

Grafana

Terraform

Ansible

Job Details

Hi,

Hope you are doing good, this is Rajeev from FutureTech Consultants, LLC and I have a job opening with our direct client.

Please have a look at the below job description and let me know your interest. Please share me the latest copy of your resume.

Role: Cloud/Platform Operations Manager

Location: Atlanta / Roswell, GA (Onsite)

Duration: 6+ Months (Possible Extension)

Must have cloud operations and Manager title with the below skills set

Hands-on experience with AWS and/or Azure (multi-account, multi-region operations).
Solid expertise with observability & monitoring tools (Datadog, Dynatrace, Splunk, Grafana, Prometheus, ELK/EFK).
Familiarity with Infrastructure-as-Code (Terraform, Ansible, GitOps).
Strong understanding of SRE principles (SLIs, SLOs, error budgets, incident management frameworks).

Job Description:

Rheem is seeking a Manager, Cloud Operations to lead, transform, and scale its digital operations landscape across CloudOps, SRE, NOC, Observability, AIOps, and MLOps.
This individual will serve as the single point of accountability for operational stability and innovation, managing offshore teams while working closely with Rheem s U.S. digital leadership.

This role is not a steady-state manager position. The successful candidate will:

Identify operational gaps.
Suggest and implement best practices and tools.
Introduce automation and innovation strategies.
Guide daily deliverables for offshore teams.
Demonstrate tangible business impact each quarter (improved uptime, reduced MTTR, cost savings, predictive alerting, etc.).
The Manager will report to the Director of Digital Operations and act as Rheem s Cloud Operations Leader in practice.

Key Responsibilities

Operations Strategy & Governance

Define the vision, strategy, and roadmap for CloudOps, Reliability, and Operational Excellence.
Establish KPIs and OKRs aligned with Rheem s business goals (availability, MTTR, cloud cost per device, customer churn reduction).
Deliver quarterly impact reports to business leadership showcasing operational improvements and ROI.

Cloud Operations & FinOps

Own multi-region cloud operations across AWS and Azure platforms.
Drive cost transparency and optimization via FinOps practices and dashboards.
Build capacity and resiliency models for predictable operations.
Conduct resiliency drills and game days to ensure high availability and compliance.

Site Reliability Engineering (SRE)

Establish SLIs, SLOs, and error budgets to measure reliability.
Build incident management playbooks and drive blameless postmortems.
Proactively improve reliability through automation, self-healing, and continuous testing.

Network Operations Center (NOC) Modernization

Transform NOC from alert-driven to predictive, AIOps-enabled operations.
Consolidate monitoring tools and reduce alert fatigue with intelligent correlation.
Ensure 24x7 support coverage through offshore team alignment and escalation management.

Observability & Telemetry

Build a unified observability stack (logs, metrics, traces, RUM) leveraging OpenTelemetry.
Enable business-oriented dashboards (device uptime, customer adoption, churn trends).
Ensure end-to-end visibility from connected devices cloud microservices customer-facing apps.

AIOps & MLOps [optional]

Deploy AIOps solutions for anomaly detection, predictive alerts, event correlation, and automated remediation.
Operationalize ML models: rollout, monitoring, drift detection, rollback strategies.
Showcase measurable value, e.g., warranty claim reduction, improved customer experience metrics.

Process Innovation & Automation

Audit current toolchain and processes; identify redundancies, gaps, and opportunities for automation.
Align with DevOps/SecOps to streamline release-to-operations handshakes.
Drive Infrastructure-as-Code for operations (Terraform, Ansible, GitOps).

Team Leadership & Offshore Management

Manage and mentor a distributed team (offshore + onsite), setting clear goals and accountability.
Define roles, responsibilities, and shift structures for 24x7 global coverage.
Build a culture of continuous improvement and operational excellence.

Compliance, Security & Risk

Ensure Rheem operations align with compliance standards (SOC2, ISO, HIPAA where applicable).
Own business continuity planning and disaster recovery testing.
Proactively identify operational risks and mitigate before they impact business.

Business Alignment & Change Leadership

Act as the voice of operations at business leadership tables.
Translate technical improvements into business outcomes (lower churn, improved uptime, faster installs, fewer complaints).
Champion a quarterly innovation agenda to showcase improvements in uptime, cost, and reliability.

Experience & Leadership

10+ years of experience in Cloud Operations, Site Reliability Engineering, or Digital Operations.
Proven track record of owning operational outcomes (uptime, MTTR, cost optimization, observability).
Experience managing offshore/global delivery teams with 24x7 coverage.
Strong leadership presence able to act as a change agent, operate autonomously, and deliver measurable outcomes without day-to-day direction.

Cloud & Technical Expertise

Hands-on experience with AWS and/or Azure (multi-account, multi-region operations).
Solid expertise with observability & monitoring tools (Datadog, Dynatrace, Splunk, Grafana, Prometheus, ELK/EFK).
Familiarity with Infrastructure-as-Code (Terraform, Ansible, GitOps).
Strong understanding of SRE principles (SLIs, SLOs, error budgets, incident management frameworks).

Process & Governance

Demonstrated ability to design and implement operations frameworks (Ops playbooks, NOC modernization, incident command systems).
Knowledge of FinOps practices (cloud cost visibility, optimization, showback/chargeback).
Experience ensuring compliance with SOC2, ISO, HIPAA or equivalent standards.

Soft Skills

Excellent stakeholder communication skills ability to link operational KPIs with business outcomes.
Strong team leadership and mentoring skills, especially across distributed teams.

Nice-to-Have

Exposure to AIOps platforms (Moogsoft, BigPanda, OpsRamp, ServiceNow AI modules).
Experience with MLOps tooling (MLflow, Kubeflow, SageMaker, Azure ML) for model deployment and monitoring.
Prior background in platform operations at a product/SaaS company (vs pure IT Ops).
Experience leading automation-first initiatives (predictive alerts, self-healing infra, auto-remediation pipelines).
Hands-on experience with CI/CD Ops handshakes and change-impact assessments.

Cloud certifications:

AWS Certified Solutions Architect / DevOps Engineer
Microsoft Certified: Azure Administrator / Solutions Architect
FinOps Certified Practitioner

Regards

Rajeev Mudakala

Sr. Talent Acquisition Specialist

FutureTech Consultants, LLC

5655 Peachtree Parkway, Suite 212, Peachtree Corners, GA 30092

Direct:

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

About FutureTech Consultants LLC

Share