Job Title: SRE Cloud Operations Engineer
Location: Alpharetta, GA
Type: Full Time/W2 with Infinite Computer Solutions
Onsite Role
Job Description:
As a Site Reliability Engineer (SRE) Cloud Operations, you will provide operational ownership, reliability engineering, and cloud operations support for a Digital Banking platform running across Windows Server, IIS, and Microsoft Azure. This role focuses on ensuring availability, performance, security, and scalability for customer-facing digital banking workloads.
You will be part of a team delivering 24x7 production support across Windows Server (2016/2019/2022) and Azure (Prod/DR) environments, working closely with Application Development, DevOps, Infrastructure, Network, and Security teams to operate at scale using SRE principles.
Experience Range: 4+ years
Key Responsibilities:
Reliability Engineering & Cloud Operations
- Provide operational ownership for Digital Banking applications hosted on Windows Server / IIS across Azure and on-prem environments.
- Apply SRE principles to improve service reliability, availability, and performance.
- Define and execute operational best practices around stability, resiliency, and controlled change.
- Support high-availability and disaster-recovery architectures across production and DR environments.
Monitoring, Observability & Incident Response (Core Focus)
- Monitor platform and application health using Dynatrace and Splunk.
- Perform advanced diagnostics using Windows Event Logs, PerfMon, and Azure Monitor.
- Lead and participate in P1/P2 incident response, including bridge calls, real-time troubleshooting, and coordination across multiple teams.
- Drive root cause analysis (RCA) and implement preventive and corrective actions.
- Track and reduce operational toil through automation and engineering improvements.
Application & Platform Operations:
- Support application deployments, hotfixes, and production releases with a strong focus on safety and repeatability.
- Manage SSL/TLS certificate lifecycle management, including renewals and configuration across IIS and load balancers.
- Execute and coordinate OS and application patching using WSSCCM and cloud tooling.
- Support and optimize F5 / ADC load-balanced environments.
Security, Compliance & Governance
- Enforce security and compliance controls including RBAC, least-privilege access, encryption in transit and at rest, Active Directory, GPOs, service accounts, and secrets management.
- Support audits, risk reviews, and control evidence collection.
Automation, CI/CD & Engineering Enablement
- Build and maintain automation using PowerShell, DSC, and Ansible.
- Partner with DevOps and AppDev teams to support CI/CD pipelines (Azure DevOps, GitHub Actions, Jenkins) for Windows/IIS workloads.
- Improve deployment reliability, rollback strategies, and operational guardrails.
- Contribute to platform designs supporting blue/green, canary, and zero-downtime deployments where applicable.
Required Qualifications (Must-Have):
- Strong hands-on experience administering Windows Server (2016/2019/2022) in production environments.
- Strong hands-on experience with IIS, including site configuration, application pools, bindings, performance tuning, and troubleshooting.
- Hands-on, production experience with Dynatrace for application and infrastructure monitoring.
- Hands-on, production experience with Splunk for log analysis, queries, dashboards, and troubleshooting.
- Experience diagnosing system and application issues using Windows Event Logs and PerfMon.
- Experience supporting high-severity production incidents, including ownership during incident bridges.
- Working knowledge of TCP/IP, HTTP/S, TLS, and integrations with load balancers, WAFs, and reverse proxies.
- Experience managing deployments, patching, SSL/TLS certificates, and formal change management processes.
- Strong PowerShell scripting and automation experience; exposure to DSC and/or Ansible.
- Experience operating workloads in Azure, including production and DR environments.
- Working knowledge of Active Directory, GPOs, service accounts, and PKI/certificate management.
- Bachelor s degree in Computer Science, Information Technology, or equivalent practical experience.