Job Title: SRE Cloud Operations (Digital Banking Platform)
Location: Alpharetta, Atlanta GA || Onsite
Type: Full Time/W2 with Infinite Computer Solutions
Role Summary
As a Site Reliability Engineer (SRE) Cloud Operations, you will provide operational ownership, reliability engineering, and cloud operations support for a Digital Banking platform running across Windows Server, IIS, and Microsoft Azure. This role focuses on ensuring availability, performance, security, and scalability for customer-facing digital banking workloads.
You will be part of a team delivering 24x7 production support across Windows Server (2016/2019/2022) and Azure (Prod/DR) environments, working closely with Application Development, DevOps, Infrastructure, Network, and Security teams to operate at scale using SRE principles.
Key Responsibilities
Reliability Engineering & Cloud Operations
Provide operational ownership for Digital Banking applications hosted on Windows Server / IIS across Azure and on-prem environments.
Apply SRE principles to improve service reliability, availability, and performance.
Define and execute operational best practices around stability, resiliency, and controlled change.
Support high-availability and disaster-recovery architectures across production and DR environments.
Monitoring, Observability & Incident Response (Core Focus)
Monitor platform and application health using Dynatrace and Splunk.
Perform advanced diagnostics using Windows Event Logs, PerfMon, and Azure Monitor.
Lead and participate in P1/P2 incident response, including bridge calls, real-time troubleshooting, and coordination across multiple teams.
Drive root cause analysis (RCA) and implement preventive and corrective actions.
Track and reduce operational toil through automation and engineering improvements.
Application & Platform Operations
Support application deployments, hotfixes, and production releases with a strong focus on safety and repeatability.
Manage SSL/TLS certificate lifecycle management, including renewals and configuration across IIS and load balancers.
Execute and coordinate OS and application patching using WSSCCM and cloud tooling.
Support and optimize F5 / ADC load-balanced environments.
Security, Compliance & Governance
Enforce security and compliance controls including RBAC, least-privilege access, encryption in transit and at rest, Active Directory, GPOs, service accounts, and secrets management.
Support audits, risk reviews, and control evidence collection.
Automation, CI/CD & Engineering Enablement
Build and maintain automation using PowerShell, DSC, and Ansible.
Partner with DevOps and AppDev teams to support CI/CD pipelines (Azure DevOps, GitHub Actions, Jenkins) for Windows/IIS workloads.
Improve deployment reliability, rollback strategies, and operational guardrails.
Contribute to platform designs supporting blue/green, canary, and zero-downtime deployments where applicable.
Required Qualifications (Must-Have)
Strong hands-on experience administering Windows Server (2016/2019/2022) in production environments.
Strong hands-on experience with IIS, including site configuration, application pools, bindings, performance tuning, and troubleshooting.
Hands-on, production experience with Dynatrace for application and infrastructure monitoring.
Hands-on, production experience with Splunk for log analysis, queries, dashboards, and troubleshooting.
Experience diagnosing system and application issues using Windows Event Logs and PerfMon.
Experience supporting high-severity production incidents, including ownership during incident bridges.
Working knowledge of TCP/IP, HTTP/S, TLS, and integrations with load balancers, WAFs, and reverse proxies.
Experience managing deployments, patching, SSL/TLS certificates, and formal change management processes.
Strong PowerShell scripting and automation experience; exposure to DSC and/or Ansible.
Experience operating workloads in Azure, including production and DR environments.
Working knowledge of Active Directory, GPOs, service accounts, and PKI/certificate management.
Bachelor s degree in Computer Science, Information Technology, or equivalent practical experience.