Monitoring and Observability Architect

Raritan, NJ, US • Posted 2 days ago • Updated 2 days ago
Contract W2
Contract Independent
Contract Corp To Corp
No Travel Required
On-site
Depends on Experience
Fitment

Dice Job Match Score™

🧠 Analyzing your skills...

Job Details

Skills

  • Continuous Delivery
  • AppDynamics
  • Continuous Integration
  • Customer Engineering
  • DevOps
  • Dynatrace
  • Grafana
  • Kubernetes
  • Splunk
  • RBAC
  • New Relic
  • Software Performance Management
  • Root Cause Analysis
  • ARM
  • Monitoring
  • Observability

Summary

We are seeking an experienced Monitoring and Observability Architect to design, implement, and optimize enterprise-wide observability solutions across cloud, on-premises, and hybrid environments. This role is responsible for defining monitoring strategies, improving system reliability, and enabling proactive incident detection through metrics, logs, and traces.

The ideal candidate combines deep technical expertise with architectural vision to build scalable, secure, and resilient observability platforms that support modern DevOps and SRE practices.

Key Responsibilities

 Architecture & Strategy

  • Define enterprise observability architecture aligned with business and IT objectives.
  • Design monitoring frameworks for applications, infrastructure, networks, and cloud-native platforms.
  • Establish standards, governance, and best practices for monitoring and alerting.

 Implementation & Engineering

  • Architect and deploy tools such as Prometheus, Grafana, Datadog, Splunk, ELK, New Relic, Dynatrace, AppDynamics, etc.
  • Implement distributed tracing (OpenTelemetry, Jaeger, Zipkin).
  • Design centralized logging and log aggregation solutions.
  • Enable APM, RUM, synthetic monitoring, and infrastructure monitoring.

 Cloud & DevOps Integration

  • Integrate observability into CI/CD pipelines.
  • Support Kubernetes and container observability.
  • Enable Infrastructure-as-Code monitoring automation (Terraform, ARM, CloudFormation).
  • Collaborate with SRE and DevOps teams to enhance reliability and performance.

 Reliability & Incident Management

  • Define SLI/SLO/SLAs and error budgets.
  • Develop intelligent alerting strategies to reduce noise.
  • Enable root cause analysis and performance optimization.
  • Support major incident investigations.

 Security & Compliance

  • Ensure monitoring solutions meet security and compliance requirements.
  • Implement role-based access control (RBAC) and secure data handling.

 Stakeholder Collaboration

  • Partner with customer, engineering, operations, security, and business teams.
  • Provide technical leadership and mentorship.
  • Present architecture designs to leadership and governance boards.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
  • Dice Id: 10300434
  • Position Id: 8903511
  • Posted 2 days ago
Create job alert
Set job alertNever miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Raritan, New Jersey

3d ago

Easy Apply

Contract, Third Party

Depends on Experience

Raritan, New Jersey

2d ago

Easy Apply

Full-time

$120,000 - $140,000

Hybrid in Raritan, New Jersey

2d ago

Easy Apply

Full-time

$80,000 - $180,000

Pennington, New Jersey

Today

Contract

USD 100,000.00 - 110,000.00 per year

Search all similar jobs