IBM Support

How to use Performance Data Investigator (PDI) to investigate disk metrics

How To


Summary

This document is to be used as a guide to start investigating IBMi Disk metrics such as IO rates and response times.

Steps

Performance Data Investigator (PDI) is part of the IBM Navigator for i.  It is a tool that can be used to investigate many of the performance metrics that are gathered by Collection Services.  This document provides the initial steps to start using this tool to investigate Disk related metrics.   Collection services data can be used to see what the system/jobs/threads are doing during a given time frame, but it does not provide any details as to why (such as callstacks, query text etc).  

To explore disk metrics and which jobs/tasks contributed to that work we first need to open the browser interface.

Substitute your system name or IP address of your partition in the following URL: http://system:2002/Navigator/login, then, log in using your credentials. 

Once logged in, click on the system you want to look at (there can be more than one here) and the Actions icon,

Image showing the Actions Icon

Choose Manage Node.  On the next screen we want to get into the performance data.  Click on the line graph icon then select Investigate Data

Image shows the line graph icon. After clicking on it, choose Investigate Data

This will bring up the Investigate Data screen where you can choose your library and dataset you want to review.

NOTE: In rare instances, you may not see your library in the list. If this happens, try rebuilding the collection table to resolve. You would go back to the line graph and choose Manage Collections then Collection Service collections. Click on Actions then Rebuild Collection Table. 

Image shows how to rebuild the collection table by clicking on the Actions Hamburger Icon

In this document we'll be reviewing disk related metrics.  To see all that are available simply enter DISK in the filter field.  Start with Disk I/O Rates Overview - Detailed


image-20241003154813-5
 

IO rates (bars) and Read/Write response times (line) are displayed on this graph.   Basic details are provided when flying over points on the graph.   The time frame/number of bars can be controlled with the sliders on the bottom of the graph and you can select any/all of the metrics listed.  

image-20241003155016-6


On this date the system experienced a performance issue from 9am - 10am.  The sliders have been adjusted to display a smaller range and it's clear that IO rates as well as IO response times (especially average write response times) increased during this period.   Root cause could be due to many factors such as poor storage response times or an increase in IO.   Storage IO response times should be reviewed by your storage team to investigate further. 

image-20241003162344-7

In this example there was also an increase in IO volume at this time - investigating the jobs that contributed to that IO may provide some additional insight.  

To do this go back to the Investigate Data Viewer and then enter Physical disk IO as the filter.  Choose the Physical Disk IO Overview - Basic and adjust the sliders to decrease the number of intervals. 

Next select/highlight the intervals of interest - in our example the performance problem occurred from 9am - 10am.  Then go to Actions -> Physical Disk I/O Overview and choose a grouping to review.   Choosing 'By Generic Job' often will help narrow the focus to a group of jobs and then you can further refine from there if needed.  

image-20241003163300-9

This will list the generic jobs, ranked by IO rates.   In this example job starting with the name JOB557 were the top contributors.   Reviewing the top 10 it's clear that jobs with the name of 'JOB5' followed by some characters are clearly driving most of the IO workload on the system during this event.


image-20241003164709-12

To review the IO signature for these jobs simply select/highlight them and go to Actions - Physical Disk I/O for Generic Jobs or Tasks - Basic.  

image-20241003165129-14

The IO signature for each generic job group selected above will then be displayed - making it easy to see when they started and which job group contributed the most to the IO workload on the system.  The application/job owners/developers should review to determine if this workload is expected. 

image-20241003165459-16

For further information on what programs are driving the IO activity in these jobs collect a job watcher trace while the issue is occurring: https://www.ibm.com/support/pages/node/685927

More information on PDI can be found at the following link: https://www.ibm.com/support/pages/performance-data-investigator-navigator-i

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB68","label":"Power HW"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SWG60","label":"IBM i"},"ARM Category":[{"code":"a8m3p000000PCOWAA4","label":"Performance-\u003ETools-\u003EPDI"}],"ARM Case Number":"","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"All Versions"}]

Document Information

Modified date:
08 October 2024

UID

ibm17172182