IBM Support

AIX MustGather: System Performance Analysis

Product Documentation


Abstract

This MustGather document was written to assist AIX Administrators collect data needed when opening a support case with AIX System Performance Team.

Content

AIX System Performance Support requires the person opening the case have some insight into the issue being reported.

Gathering information  before calling IBM support could shorten the time it takes to resolve the performance issue.   The AIX System Performance Analysis team requires at least two pieces of information when diagnosing a AIX System Performance issue:  
  •   Problem description
  •   Perfpmr data collection
 1.  The first requirement is to provide a detailed problem description of the performance issue being reported.   Please answer the following questions:
  •  What is the exact nature of the performance problem?     (Such as slow system response times, longer than normal batch job completion times,  slow backups,  performance metrics reported by the OS or applications.)
  • When did the performance issue first appear? 
  •  Is there just one partition impacted  by this slowdown?  If not, How many other partitions are impacted?  Are the impacted partitions located on the same frame?
  •  Were there any changes made to the hardware, application, operating system, or network before the problem first appeared?   If so, provide details of the changes that were made.
  •  Is the performance problem chronic (happening constantly) or is it intermittent?   If intermittent, how often are you seeing it happen?
  •  How long does the slow down last? (hours, minutes).?
  • Can the slowdown issue be reproduce the slowdown  on demand?
  •  How does  recovery  from this issue or occur or does the system /application performance return to normal without user input?  How long does it take to recover?  
  • Are you using any monitoring tools such as vmstat, iostat, lparstat,   etc. being used to help with identifying the resource bottleneck of the slowdown?  If so, provide output from those commands from the time period when the slowdown occurred as a testcase along with providing the timestamp of when the error occurred.
  • Are there any other vendors involved with resolving this issue, such as  EMC, Oracle, etc ?
  •  If the  issue is related to a batch job,  how long does  it take for the batch job  to complete?  Provide both batch job run times and slow run times.   (Specify the time differences in seconds, minutes, hours) 
  •  How often does the batch  job in question run, once per day, once a week?
*There are additional questions in PROBLEM.INFO file included with the  perfpmr tool download script.  Feel free to  answer those questions as well and place the updated version in the data directory used to store output files .
Add the answers to the above questions should be added as update to the Sales Force case.
 
2.  The second step is collecting performance data using perfpmr.sh script available via download.
      *You must collect perfpmr data at the time the system is  experiencing slowness.* 
      *Collecting perfpmr data before or after the slowdown will not contain data collect any useful data

      Here are links to  download the perfpmr.sh script and README files.   It may be necessary  to cut and paste the links in your browser.
The README file will provide detailed information on running a perfpmr and uploading the output to IBM: 
It is recommended that collecting perfpmr data be collected on the  VIOS if virtual devices configured:
  • It will be necessary l need to collect perfpmr data at the same time your collecting perfpmr on client partition again while experiencing slowness.   
  • The steps used to collect perfpmr data on the  VIOS  are the same as  on the client partition;  however, it will necessary  to login as oem_setup_env for root access:
Download the perfpmr data collection script for VIOS from the following link:
VIOS 2.2 running AIX 6.1  ftp://ftp.software.ibm.com/aix/tools/perftools/perfpmr/perf61/perf61.tar.Z  

VIOS 3.1 running AIX 7.2   ftp://ftp.software.ibm.com/aix/tools/perftools/perfpmr/perf72/perf72.tar.Z 

*For host using shared processors, it is recommend to enabling CEC-wide performance information via the HMC before collecting perfpmr data . 
      -Log on to the  HMC
      -Right click on the specific LPAR
      -Properties
      -Check the Box for 'Allow performance information collection
The above changes  to "Allow performance information collection dynamic and do NOT  require a reboot of the system.  
This change will allow the lparstat command display the number of processors in the shared pool. 
*Special considerations for hosts running HACMP 7.2 and above...    
Prior to running the perfpmrtool  on an active PowerHA cluster node, some tunables should be modified to reduce the likelihood of a false takeover due to the additional load.
1. Record the current values.  The default values will vary depending on the software level and other factors.
clmgr query cluster | egrep -w "NETWORK_FAILURE_DETECTION_TIME|HEARTBEAT_FREQUENCY"

2. Extend the values.  The maximum is 600 seconds for HEARTBEAT_FREQUENCY, and NETWORK_FAILURE_DETECTION_TIME must be at least 10 seconds less than HEARTBEAT_FREQUENCY.  A cluster sync is required, and can be done while cluster services are running.  These values will be propogated to all nodes.
clmgr modify cluster HEARTBEAT_FREQUENCY=600
clmgr modify cluster NETWORK_FAILURE_DETECTION_TIME=90
clmgr sync cluster

3.  Run perfpmr.  When the perfpmr data collection complete, revert the tunables to their previous values and synchronize.
clmgr modify cluster NETWORK_FAILURE_DETECTION_TIME=[previous_value]
clmgr modify cluster HEARTBEAT_FREQUENCY=[previous_value]
clmgr sync cluster
#########   Now your ready to collect perfpmr data. ##############
 
TESTCASE UPLOADS
Upload details are provided in the README files listed above. For convenience, these steps are summarized below.

Upload  yourcase#.pax.gz created during the perfpmr collection using one of the following options (a, b, or c)

     a) Attach to your case
     https://www.ibm.com/mysupport/s/my-cases

     b) Upload to the Enhanced Customer Data Repository(ECuRep) 
     https://www.secure.ecurep.ibm.com/app/upload_sf

     c) Upload to the Blue Diamond FTP server (Blue Diamond Customers Only)
     https://msciportal.im-ies.ibm.com

* Note: For information about doing a Blue Diamond upload see:

     http://www.ibm.com/support/docview.wss?uid=nas8N1020947

Document Location

Worldwide

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"Component":"Performance","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
20 January 2021

UID

ibm10875894