Sr. Devops Engineer (Datadog) 15+ years -New York, NY (Hybrid)

    • A-1 Consulting Inc, Atlanta, GA
  • New City, NY
  • Posted 1 day ago | Updated 17 hours ago

Overview

On Site
Full Time

Skills

DevOps
GC
Microsoft Windows
Data Analysis
Reliability Engineering
Performance Monitoring
Trend Analysis
ROOT
Documentation
Training
Collaboration
Application Performance Management
AppDynamics
Dynatrace
Software Performance Management
Linux
Programming Languages
.NET
Node.js
Python
Java
Instrumentation
IOS Development
Android
JavaScript
HTML5
Amazon EC2
Cloud Computing
KPI
Status Reports
Load Balancing
Clustering
Network Layer
Oracle
PostgreSQL
Database
SQL
Dashboard
Incident Management
Configuration Management
Terraform
Ansible
Network
Management
Project Management
Communication
Critical Thinking
Analytical Skill
Production Support
Change Control
Escalation Management
Computer Science
Performance Testing
Web Testing
Apache JMeter

Job Details

Job Description

Hi Everyone,


Hope you are doing well.

Please find the job description below and Let me know if you are interested.


Job Title : Sr. Devops Engineer (APM)

Location : New York, NY (Hybrid)

Only : USC/GC


Job Overview: The Application Performance Management (APM) Engineer candidate will be responsible to perform integration in the application performance management / web infrastructure space, on Windows & Java platforms, requiring coding experience. Given the overall requirement to better integrate the teams across , close collaboration with other engineering teams is also a key component of the role. This will require the candidate to demonstrate a broad exposure to various technologies along with a deep technical knowledge (specifically in the web, infrastructure, and network space). This position will provide APM engineer and data analytics expertise to configure and operate the infrastructure that enables observability of critical application systems.



Key Responsibilities:


  • Drive improved systems reliability, availability and performance in a multi-cloud environment by operating observability platform tools that expose views and data to application teams to detect anomalies and make informed decisions.
  • Install and configure agents, APIs, performance monitoring tool alerts, dashboards, and data trend analysis in Datadog
  • Identifying performance bottlenecks, identifying anomalous system behavior, and resolving root cause of service issues
  • Create and maintain documentation, videos and other content utilized within the observability team, and also as training and guidance for engineering team consumers
  • Collaborate with Application teams to understand their requirements for views into metrics, traces and logs, can build standards for how different architectures are monitoring with common tooling, can drive improvements to tools and drive recommendations for changes, and can enable engineering teams to get the most use out of the observability platform tooling that is available.


Knowledge, Skills & Experience Required:



  • 5+ years hands-on experience in using any Application Performance Management tools - Datadog/New Relic/AppDynamics/Dynatrace
  • 3+ years hands-on experience implementing Datadog APM/Infra/Log monitoring solutions
  • Experience with Datadog real-user monitoring and/or synthetic monitoring products and solutions.
  • Strong understanding of Security, Monitoring and Performance aspects of cloud-native platform and application architectures.
  • Manages, configures and maintains the Datadog APM tool on Linux platform.
  • Understanding of performance aspects of various programming languages: Java, .NET, NodeJS, Python
  • Responsible for Java Applications instrumentation with Data Dog, set up health rules and fine tune monitoring in Data Dog.
  • Maintain a deep understanding of the customer's business as well as their technical environment
  • End-user technologies: iOS, Android, JavaScript, HTML5
  • 3+ years hands-on experience with AWS EC2, ECS Containers, Cloud Watch
  • Knowledgeable about useful metrics, and how to work with them to track against goals/KPIs and Dashboards
  • Analyzes tool data and usage. Communicates weekly with management verbally and via written detailed status reports regarding potential problems and concerns.
  • Understanding, and experience with load-balancing and clustering and ability to troubleshoot problems at the network layer
  • Experience troubleshooting performance issues related to Oracle/MS SQL/Postgres databases
  • Proficient in SQL
  • Experience creating custom dashboards, creating alerts, and developing AIOps rules, and integration with alert monitoring and incident management systems
  • Experience with configuration management tools such as Terraform and Ansible.
  • A clear understanding of network & system Management solutions
  • Excellent organizational and project management skills
  • Excellent communication, critical thinking & analytical skills
  • Experience with production support, change control, and problem and escalation management.


Preferred:


  • Bachelor s or Master s degree in Computer Science
  • Performance Testing web applications using JMeter (nice to have)

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.