Observability & Monitoring Engineer

Overview

Full Time

Skills

PERL

Best Practices

Remediation

Problem-Solving

DEV OPS

Continuous Integration/Delivery

Change Management

Javascript

Operations

Amazon Web Services

GCP

Metrics

Jenkins

Puppet

Kubernetes

Terraform

Forecasting

Capacity Planning

Switch Capacity

Solarwinds

APM

Application Performance

IT Infrastructure

Logging Tools

Incident Management

MTTR

Job Details

Job Title: Observability & Monitoring Enginee w/d Solarwinds and DynaTrace Exp
Location: Rancho Cucamonga, CA 5 Days Onsite Role
Duration: Long Term Project

Duties and Responsibilities:
The Monitoring and Observability engineer will be responsible for Designing, configuring, monitoring, implementing, and maintaining our observability solutions and troubleshooting IT systems and applications to ensure optimal performance and reliability. You will work closely with cross-functional teams to identify potential issues and provide innovative insights to optimize system performance, stability, and availability. The engineer will also be responsible for automating alerting and remediation processes to reduce mean time to resolution (MTTR) and improve system uptime.

Mandatory Skills:

3+ years of experience working in the observability, operations, or DevOps domains.
Proficient in Observability, monitoring, and logging tools Like Dynatrace, SolarWinds
Candidate should have done installation, setting up and configuration on monitoring tools - Like Dynatrace, SolarWinds.

The responsibilities of Integrated Operations, Engineer II include the following:

Configure and maintain monitoring and observability tools and systems. Solarwinds & Dynatrace
Monitor Server, network infrastructure and application performance metrics, and identify patterns and trends to improve system performance and reliability.
Troubleshoot issues and outages, working closely with development and operations teams to identify root causes and develop solutions.
Automate alerting and remediation processes to reduce mean time to resolution (MTTR) and improve system uptime.
Conduct capacity planning and forecasting to ensure scalability and optimal performance of IT systems and applications.
Collaborate with cross-functional teams to support incident management, change management, and problem management processes.

Skills required -

Deep understanding of IT infrastructure monitoring and observability best practices.
Strong analytical skills, with the ability to analyze large amounts of data and identify patterns and trends.
Strong troubleshooting and problem-solving skills, with the ability to quickly diagnose and resolve complex issues.
Programming skills in languages such Perl, Shell, or JavaScript.
Experience with automation tools such as Ansible, Puppet or Terraform.
Experience with container orchestration tools like Kubernetes.
Experience with cloud platforms such as AWS, Google Cloud Platform, or Azure.
Experience with CI/CD tools like Jenkins.

Job Details

Share