Overview
Skills
Job Details
Distributed Performance Engineer (DPE)
As a Tier 4 LoB-Facing Internal Consulting Engineer specializing in performance, you will conduct in-depth forensics network and application studies for production issues already investigated by numerous cross-technical Tier 1, 2 and 3 Teams yet remains negatively impacting client revenue, profit and/or reputation.
Independently diagnose root cause of the performance production issue principally relying on network packet analysis of business transactions as they cross distributed systems both globally OnPrem and Public Cloud Data Center Tiers to identify the failed component (software and/or infrastructure) responsible for the failure.
Author, publish & present detailed formal Findings, Analysis, and Recommendation Reports to the Product Owner and Senior Leadership responsible for the failed component.
For Infrastructure-based failures, lead the OnPrem and/or Public Cloud (AWS, Azure, Google) Infrastructure Team (compute, network and/or storage) for remediation of the failed component (e.g., firewall, circuit, disk).
For Software-based failures, lead the Application Software Development Team, either internally or a vendor, for remediation of the failed software module (e.g., application code, SQL, Messaging).
1. Interview Customers & Review Prior Incident Reports
Conduct detailed interviews with the customers to gather information about the poor end user experience and/or slow business transactions.
Review written incident reports previously written by the infrastructure and/or application teams to understand the initial findings and reported issues.
2. Collect Forensic Evidence Collected by Other Teams
IP Addresses for each endpoint
Architecture diagrams of the systems
Application and infrastructure logs
Data Center network diagrams
Performance reports detailing the incident.
3. Create Network Topology
Identify Data Center hosting each processing tier.
Identify the in-between networks (primary & redundant)
Create a new topology map of the flows for the application.
4. Network Packet Collection Points
Research network taps & ER-SPANs which can collect traffic of interest.
Configure packet brokers to collect traffic.
Identify the in-between networks (primary & redundant)
Create a new topology map of the flows for the application.
Network topologies.
Performance reports detailing the incident.
Collect TCP/IP network packets from strategic network locations where the relevant network traffic traverses the client Backbone Network and/or connects to Public Cloud Platforms (AWS, Azure, Google Cloud Platform).
5. Findings and Analysis Report
Publish a detailed Findings & Analysis Report for review by the Product Owner of the Processing Tier responsible for the slowdown. The report should include:
An overview of the profiling results.
Identified bottlenecks and their locations.
Any anomalies or irregularities observed during packet analysis.
6. Collaboration for Root Cause Analysis
Collaborate closely with the Product Team to perform a deeper analysis of the specific circumstances leading to the performance issue.
Identify the root cause of the problem and recommend remediation steps.
7. Remediation and Resolution
Work with the responsible Product Owner to implement the recommended remediation steps.
Ensure that the issue is resolved and the Service Level Agreement (SLA) thresholds are met once again.
Continue monitoring and adjusting as necessary to maintain performance standards.