Cloud Monitoring Engineer - Azure

Pittsburgh, PA, US • Posted 14 hours ago • Updated 14 hours ago
Contract W2
No Travel Required
On-site
$90 - $107/hr
Fitment

Dice Job Match Score™

📋 Comparing job requirements...

Job Details

Skills

  • Analytics
  • AppDynamics
  • Cloud Computing
  • DNS
  • IT Service Management
  • SIEM
  • Splunk
  • Microsoft Azure
  • Computer Networking
  • Dynatrace
  • Dashboard

Summary

Job Title: Cloud Monitoring Engineer - Azure
Duration: 12+ Months (Possible extension)
Location: Pittsburgh, PA 15258 | Lake Mary, FL 32746 | New York, NY 10286
Onsite Role (4 days a week)
 
Responsibilities:
  • Seeking a skilled Cloud Monitoring and Observability Engineer (Azure) engineer to design, implement, and optimize end-to-end monitoring and observability solutions for a mission-critical application deployed in the Azure environment.
  • The ideal candidate has hands-on experience with enterprise monitoring tools—such as AppDynamics, Thousand Eyes, NetScout, and SolarWinds (or equivalent alternatives)—and a strong background in building scalable, secure, and compliant observability stacks for cloud deployments.
  • Will collaborate closely with application engineering, cloud platform, network, and security teams to ensure comprehensive coverage across application, infrastructure, and network layers
  • Design and implement end-to-end monitoring, alerting, and observability for an Azure-hosted application across application, infrastructure, network, and user experience layers.
  • Configure, integrate, and maintain enterprise monitoring platforms to deliver actionable telemetry, performance baselines, and SLA/SLO tracking.
  • Build dashboards, health checks, synthetic tests, and alerting workflows; optimize alert fidelity to minimize noise and improve signal-to-noise ratio.
  • Establish and document telemetry standards (metrics, logs, traces), data collection strategies, and service-level indicators (SLIs) aligned to reliability objectives (SLOs).
  • Integrate Azure-native services (Azure Monitor, Log Analytics, Application Insights) with enterprise tools to provide unified visibility and correlation.
  • Implement network performance monitoring, path visibility, and internet/extranet testing using NPM tools (e.g., ThousandEyes, NetScout); leverage infrastructure monitoring platforms (e.g., SolarWinds) for device and service health.
  • Instrument applications with APM tools (e.g., AppDynamics, Dynatrace, New Relic) for business transaction monitoring, dependency mapping, and root-cause analysis; tune anomaly detection and policy thresholds.
  • Collaborate with DevOps/SRE teams to embed monitoring into CI/CD and infrastructure-as-code patterns; ensure new services adhere to observability standards.
  • Define runbooks and escalation paths; support incident response and post-incident reviews with data-driven insights and remediation recommendations.
  • Ensure monitoring solutions meet applicable security and compliance requirements; support audit requests with clear documentation and evidence.
  • Conduct capacity and performance trend analysis; recommend optimization, right-sizing, and resilience improvements.
  • Provide knowledge transfer, documentation, and training on monitoring tools, best practices, and operational workflows.
Education/Experience:
  • 5+ years implementing enterprise monitoring/observability for cloud or hybrid environments, including mission-critical applications.
  • Demonstrable expertise with at least one tool in each category (or equivalent), including production deployments, advanced configuration, and operational use:
  • Application Performance Monitoring (APM): AppDynamics, Dynatrace, or New Relic.
  • Experience instrumenting services for business transaction tracing, code-level diagnostics, service maps, and anomaly detection.
  • Ability to design APM dashboards and create alert policies with appropriate thresholds and baselines.
  • Network Performance Monitoring (NPM) / Digital Experience Monitoring (DEM): Thousand Eyes, NetScout, or Kentik.
  • Experience with synthetic tests, path visualization, packet-level analysis, and internet/WAN performance monitoring.
  • Ability to configure endpoint agents, BGP/DNS tests, and multi-hop path monitoring for user experience correlation.
  • Infrastructure Monitoring and Event Management: SolarWinds, Microsoft SCOM, Datadog, or PrometheGrafan.
    • Experience monitoring servers, containers, network devices, and cloud services; creating availability and capacity dashboards.
    • Proficiency with alert routing, de-duplication, and event correlation.
    • Strong Azure monitoring experience: Azure Monitor, Log Analytics (KQL), Application Insights, and integration with third-party tools.
  • Solid understanding of distributed tracing, metrics, and log aggregation; familiarity with Open Telemetry concepts and data pipelines.
  • Scripting/automation skills (PowerShell, Python, or Bash) to automate monitoring configuration, agent deployment, test creation, and reporting.
  • Networking fundamentals (DNS, BGP, HTTP, TLS, TCP/IP), CDN concepts, and WAN performance monitoring; ability to correlate app and network telemetry.
  • Experience supporting incident response and performance troubleshooting across applications, infrastructure, and network layers.
  • Excellent documentation and communication skills; collaborative mindset with engineering, operations, and security stakeholders.
Preferred:
  • Background in regulated environments (financial services, government, healthcare) with compliance-aware monitoring design.
  • Experience with log aggregation and SIEM/SOAR platforms (e.g., Splunk, Elastic) and integration with APM/NPM tools.
  • Integration experience with ITSM platforms (e.g., ServiceNow) for incident, change, and problem management workflows.
  • Familiarity with infrastructure-as-code (ARM/Bicep/Terraform) and embedding observability into IaC patterns; experience with CI/CD integration.
  • Exposure to SRE practices (SLIs/SLOs, error budgets, reliability reviews) and capacity/performance planning.
  • Ability to code in one or more of the following languages for instrumentation, custom telemetry, SDK integration, and tooling automation:
    • Java: Implementing Open Telemetry SDKs/agents, custom instrumentation, and APM tagging; building synthetic test harnesses.
    • .NET (C#): Instrumenting  services, configuring APM auto-instrumentation, writing custom exporters and health probes.
    • Python: Building automation scripts, collectors/exporters, synthetic tests, and integrating with monitoring APIs and SDKs.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 90987567
  • Position Id: 8909965
  • Posted 14 hours ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Pittsburgh, Pennsylvania

Today

Contract

Remote

Today

Easy Apply

Contract

Depends on Experience

Remote

7d ago

Easy Apply

Contract

Depends on Experience

Remote

7d ago

Easy Apply

Contract

Depends on Experience

Search all similar jobs