IBM Support

High I/O latencies on applications running on AIX

How To


Summary

Applications (for example Oracle or DB2 databases) reporting disk I/Os take several seconds

Objective

Identify possible causes of applications waiting for seconds on disk I/Os to complete

Steps

I/O latencies can be due to several different reasons, we focus here on long ones taking seconds only.

In cases where the application (for example an Oracle or DB2 database) reports I/O latencies of 1 second or higher, the first thing to check is the disks response time, here a sample "iostat-Dl" output

image 5840

The average disk response time, "avg serv", is good, but the "max serv" is in the range of seconds, which is a huge time: taking for example hdisk9, the average read time is 1.5ms and the average write time is 0.2ms, but the max read time is 29.1 seconds, meaning there was at least one I/O (a read in this case) that took 29.1 seconds to return.
If the I/O times reported by the application correspond to the disks response time that is the cause of the latencies and must be investigated.
The time reported here is all spent out of the AIX system, the long I/Os are caused by something happening either on the SAN or the Storage: it is not possible to determine from the AIX system where that time was spent, further investigation has to be done involving the SAN and Storage teams.

NOTE: depending on what causes the problem, the AIX error log might not log any entry.

If the disks response time is good, check whether those long I/Os are related to Logical Volumes where LVM Mirror is configured with MWCC (Mirror Write Consistency Check) policy Active.
Assuming the database reports long I/Os on data files in /oracle/data/data1 filesystem, "mount" command shows the LV name for that filesystem, here LVdata1

image 5841

You can then use "lslv" command to get the details of the LV:

image 5844

Here the Logical Volume LVdata1 has 2 copies (COPIES: 2), 8000 Logical Partitions with 16000 Physical Partitions, 8000 on hdisk10 and 8000 on hdisk20, and Mirror Write Consistency set to Active.

NOTE that MWC has only effect when the LV is mirrored, thus it has 2 or 3 copies: for non-mirrored LVs (COPIES: 1) MWC is not used, no matter what it is set to.

On systems with very high I/O load on mirrored LVs, the MWCC activity can cause latencies in the LVM layer, so you can have the application I/Os taking seconds to complete while the disks response time is good. While this is caused by writes, the latencies can affect reads too.
There are no commands available that can easily identify the problem, this can be detected by looking at a kernel trace collected when these long I/Os are taking place: in most cases this is not easy to accomplish because of the occasional nature of the problem.

This problem can be avoided by changing the MWCC from Active to Passive:
Passive MWCC is faster during normal operations since it doesn't track/log all the writes (this causes the latencies when using Active), the tradeoff is that in the event of a system crash the entire mirrored LV must be synchronized. The amount of time needed depends on the size of the LVs, read operations can be slower until all the partitions have been resynchronized.
 

More details on LVM Mirror and MWC are available in the Related Information links provided later, here a short abstract from "AIX 5L Differences Guide Version 5.1 Edition" Redbook where Passive MWC was introduced:

Starting with AIX 5L a new Mirror Write Consistency Check (MWCC) algorithm (Passive) has been introduced for mirrored Logical Volumes, which behaves differently than the Active one (which is the default).
Previous versions of AIX used a single MWCC algorithm, which is now called the Active MWCC algorithm, to
distinguish it from the new algorithm. With Active MWCC, records of the last 62 distinct logical transfer groups (LTG) written to disk are kept in memory and also written to a separate checkpoint area on disk. Because only new writes are tracked, if new MWCC tracking tables have to be written out to the disk checkpoint area, the disk performance can degrade if there are a lot of random write requests issued. The purpose of the MWCC is to guarantee the consistency of the mirrored logical volumes in case of a crash. After a system crash, the Logical Volume Manager will use the LTG tables in the MWCC copies on disk to make sure that all mirror copies are consistent.

The new Passive MWCC algorithm does not use an LTG tracking table, but sets a dirty bit for the mirrored logical volume as soon as the volume is opened for writes. This bit gets cleared only if the volume is successfully synced and is closed. In the case of a system crash, the entire mirrored logical volume will undergo a background resynchronization spawned during vary on of the volume group, because the dirty bit has not been cleared. Once the background resynchronization completes, the dirty bit is cleared, but can be reset at any time if the mirrored logical volume is opened. It should be noted that the mirrored logical volume can be used immediately after system reboot, even though it is undergoing background resynchronization. The tradeoff for the new Passive MWCC algorithm compared to the default Active MWCC algorithm is better performance during normal system operations. However, there is additional I/O that can slow system performance during the automatic background resynchronization that occurs during recovery after a crash.

NOTE: Passive MWCC is only available on Big type and Scalable type of volume groups.
             Mirror Write Consistency (MWC) and Bad Block Relocation (BBR) are not supported in a concurrent setup with multiple active nodes
             accessing a disk at the same time.
             The change requires a downtime because the LV must be closed, and this means the filesystem it contains must be unmounted.
            
 


Other possibilities that can be considered:
1.  redistribute the data to have one Logical Volume per Volume Group
      The MWCC area is per VG, so having one LV only reduces the likelihood of contention: this can help in some cases, depending on the
      configuration and the load, but will not avoid the problem (for example if writes are to one or few filesystems only)
2.  remove the LVM mirror and implement it at the Storage level.

 

Additional Information

SUPPORT

If you require more assistance, use the following step-by-step instructions to contact IBM to open a case for software with an active and valid support contract.  

1.  Document (or collect screen captures of) all symptoms, errors, and messages related to your issue.

2.  Capture any logs or data relevant to the situation.

3.  Contact IBM to open a case:

   -For electronic support, see the IBM Support Community:
     https://www.ibm.com/mysupport
   -If you require telephone support, see the web page:
      https://www.ibm.com/planetwide/

4.  Provide a clear, concise description of the issue.

5.  If the system is accessible, collect a system snap, and upload all of the details and data for your case.

 - For guidance, see: Working with IBM AIX Support: Collecting snap data

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
03 September 2020

UID

ibm16325993