Title- Lead Azure L2 Cloud Operations Engineer
Location- Dallas, TX- Onsite (5 Days/Week)
Type- Long Term Contract
Job Description-
AgreeYa Solutions is hiring an Azure L2 Cloud Operations Engineer to join a managed cloud operations team supporting a mission-critical, customer-facing Azure environment. This is a hands-on operational role embedded within an Agile delivery model, working side-by-side with the client's internal platform team.
The role sits at the L2 layer of a structured L1/L2/L3 support model. The engineer is responsible for reactive incident triage routed from the client's L1 service desk, proactive ticket initiation from observability signals, operational execution via Azure DevOps and controlled privileged access, and security posture uplift activities ahead of and following a customer-facing application go-live. This person operates under a SOC 2 Type 2 compliant model, executing on behalf of the client while the client retains accountability.
Coverage requirements shift at go-live: pre go-live is US business hours with offshore coordination; post go-live is 24/7 coverage for the customer-facing application. Candidates must be comfortable operating in both modes.
Responsibilities-
1. L2 Incident Triage & Operational Support-
Receive and action reactive L2 tickets routed from the client's L1 service desk, performing root cause analysis and resolution within defined SLA windows.
Initiate proactive L1/L2 tickets based on observability signals, anomaly detection, and threshold breaches without waiting for end-user impact to be reported.
Participate in joint L2/L3 incident triage sessions with the client's internal technical team, contributing structured diagnosis and clear escalation decisions.
Pull in L3 resources (client-side or AgreeYa-side) for complex incidents requiring deep development, infrastructure, or security expertise.
Maintain accurate, timely ticket documentation in the client's ITSM tooling (ServiceNow), including resolution steps, root cause, and preventive actions.
Adhere to a controlled privileged access model executing elevated access tasks through approved PIM elevation flows, not permanent grants.
2. Azure DevOps Operations-
Execute operational tasks within Azure DevOps including pipeline monitoring, build/release triage, environment health checks, and deployment coordination.
Support infrastructure and application deployment activities in QA, Stage, and Production environments, following change control and approval gate processes.
Identify and flag CI/CD pipeline failures, IaC drift, or environment configuration deviations, escalating to the L3 architect as needed.
Assist with release coordination activities managing deployment schedules, go/no-go checklists, rollback readiness, and post-deployment smoke tests.
Access and operate within the Azure tenant through a controlled VDI desktop model under client-managed permissions; use break-glass/bastion access only when approved.
3. Observability, Monitoring & Proactive Detection-
Monitor Azure Monitor, Log Analytics, and Application Insights dashboards for platform health, performance anomalies, and security signals.
Maintain and tune alert rules and notification thresholds across the observability stack to reduce noise and improve MTTD.
Support integration and operational use of ScienceLogic and ITSM tooling for alert-to-ticket workflows.
Contribute to the creation and maintenance of operational runbooks, threshold documentation, and monitoring dashboards filling gaps in current readiness artifacts.
Participate in MTTD/MTTR benchmarking and reporting to drive continuous improvement in detection and resolution performance.
4. Security Posture & Tooling Enablement-
Support security posture uplift activities, including closing gaps identified in Defender for Cloud security score and assessment findings.
Assist with the operational use of Wiz helping the client understand findings, prioritize remediations, and track closure of identified vulnerabilities.
Monitor and action security alerts from Microsoft Defender for Cloud, escalating critical findings to the Security & Observability Engineer or L3.
Follow Zero Trust operational principles including least-privilege access, NSG and firewall rule hygiene, and private endpoint usage.
Support evidence collection for SOC 2 Type 2 compliance activities, including change control records, access logs, and incident documentation.
5. Operational Readiness & Agile Delivery-
Participate in daily standups, sprint planning, and retrospectives within the client's Agile/Scrum delivery cadence.
Collaborate directly with the client's internal platform team and application rollout squad, integrating into their working rhythm for go-live readiness activities.
Contribute to operational readiness artifacts including runbooks, guardrails, escalation playbooks, and environment-specific threshold documentation.
Support the client's 'path to green' readiness initiative executing on yellow-to-green remediation tasks from the operational readiness matrix.
Coordinate with the Release/Change Coordination function and senior AgreeYa colleagues to ensure delivery activities align with client priorities and go-live timelines.
Required Qualifications-
3+ years of hands-on experience in Azure cloud operations, infrastructure support, or DevOps engineering roles.
Solid working knowledge of core Azure services relevant to operational support:
Compute: App Service, AKS, Azure Functions, Virtual Machines
Networking: VNets, NSGs, Private Endpoints, Azure Front Door, WAF
Observability: Azure Monitor, Log Analytics, Application Insights, KQL query authoring
Security: Microsoft Defender for Cloud, Azure Key Vault, RBAC, PIM
DevOps: Azure DevOps Pipelines, Repos, artifact feeds
ITSM: ServiceNow or equivalent incident/change/problem management tooling
Demonstrated experience in L2 incident triage and resolution, including structured root cause analysis and escalation management.
Hands-on experience operating within Azure DevOps monitoring pipelines, triaging failures, supporting deployments, and following change control processes.
Familiarity with Terraform or other IaC tooling sufficient to identify drift, understand state, and communicate issues to L3 architects.
Working knowledge of cloud security fundamentals: least-privilege access, Zero Trust principles, network segmentation, and vulnerability management.
Experience working within a managed services or client-embedded operations model, operating on behalf of a client under a defined RACI structure.
Strong written and verbal communication skills able to document incidents clearly, update stakeholders accurately, and participate effectively in joint triage sessions.
Comfortable working in an Agile/Scrum delivery environment with daily standups, sprint cadences, and integrated team workflows.
Preferred Qualifications-
Experience with Wiz or similar cloud security posture management (CSPM) tools interpreting findings and supporting remediation.
Familiarity with ScienceLogic or enterprise observability platforms used in managed services contexts.
Exposure to SOC 2 Type 2 operational controls, evidence collection, and compliance-aware change management.
Experience with controlled privileged access models including Azure PIM elevation workflows and just-in-time (JIT) access patterns.
Knowledge of NYDFS Part 500 or other financial services regulatory requirements as they relate to cloud operations and MFA enforcement.
Familiarity with release management practices: deployment scheduling, go/no-go criteria, rollback planning, and post-deployment validation.
Azure certifications: Azure Administrator Associate (AZ-104) or Azure DevOps Engineer Expert (AZ-400) preferred.
Prior experience in a financial services, payments, or regulated industry environment.
Role Context & Operating Model-
This engineer operates within a structured L1/L2/L3 support model. The client's internal team owns L1 service desk intake and routes incidents onward. AgreeYa's L2 engineer handles triage, resolution, and escalation with L3 architects and development resources available on both sides for complex issues.
Access to the client's Azure tenant is provided through a controlled VDI desktop environment with client-managed permissions. Privileged elevation follows an approved PIM model not permanent access grants. All work is executed via tickets, with the client providing direction and AgreeYa enacting.
Works under the direction of the AgreeYa Azure Operations Lead / L2 Lead and coordinates closely with the Security & Observability Engineer.
Integrates directly into the client's delivery team for application go-live readiness attending standups, sprint reviews, and joint triage sessions.
Pre go-live: US business hours coverage with coordinated offshore support. Post go-live: rotational 24/7 coverage for the customer-facing payment application.
Reports into AgreeYa's Digital Solutions practice; day-to-day direction from the Engagement Manager on active client accounts.