IBM Support

How to Detect and Resolve APAR HU01706

Preventive Service Planning


Abstract

Any clients with systems running v7.7.1.7 or v7.8.1.3 must take the following actions as soon as possible:

1. Use a detection tool, to confirm whether the system is currently affected by APAR HU01706.
2. Upgrade the system to v7.7.1.8 or v7.8.1.4 (even if the system is not currently affected).
3. If upgrade is not possible immediately, take steps detailed below, to avoid further issues until an upgrade can be completed.

More details on APAR HU01706 are contained in the following flash: http://www-01.ibm.com/support/docview.wss?uid=ssg1S1010879

Content



1. Use the detection tool, to confirm whether the system is currently affected by APAR HU01706

The detection tool will automatically do the following for each I/O group:

  1. Find a storage pool with at least 2GB free space. If no storage pool has free space, then if possible create a new storage pool with at least 2GB of free space. If this is not possible, contact IBM support.
  2. Create a thin-provisioned volume (taking no more than a few megabytes)
  3. Check this volume for evidence of APAR HU01706.
  4. Delete the thin-provisioned volume

Use the tool as follows. You will need SSH access configured before continuing.

a. Download the tool package

The tool package can be downloaded from Fix Central. Choose the link for the appropriate product from the following page:

https://www.ibm.com/support/home/search-results?q=apar%20hu01706

Note: if the search gives more than one link for your product, use any one of those links.
Use a link for the correct product, so that Fix Central grants access to the download.

The current package name is: IBM_INSTALL_test_for_HU01706_20171214.094229

b. Copy the package to the config node:

Use scp on Linux/Unix systems, or pscp on Windows.
    scp <package_name> superuser@<cluster_ip>:/upgrade/

c. Install the package on the config node using this CLI command:

svctask applysoftware -file <package_name>

This unpacks the tool on the config node, but does not cause any node reboots.

d. Run the tool using this CLI command:

test_for_HU01706


e. Check the results

The tool gives updates as it's running the test, and finally prints out a results summary.

This output indicates that the system is not currently affected:

======================== Test results summary  ==========================

  No nodes in this system are currently affected by APAR HU01706.

  Please plan an upgrade to prevent this issue occurring in the future.
  See the following webpage for details:

     http://www-01.ibm.com/support/docview.wss?uid=ssg1S1010888
=========================================================================


Alternatively, this output indicates that node ID 1 is currently affected:

================================= Alert =================================
 Node 1 is affected by APAR HU01706. Please take action to correct the
 issue as soon as possible, as described on the following webpage:

       http://www-01.ibm.com/support/docview.wss?uid=ssg1S1010888
=========================================================================

======================== Test results summary  ==========================

  1 node in the system is currently affected by APAR HU01706.

  Please take action to correct the issue immediately, as described
  on the following webpage:

     http://www-01.ibm.com/support/docview.wss?uid=ssg1S1010888
=========================================================================

Take note of the affected node IDs for use in step 3.

2. Upgrade the system to 7.7.1.8 or 7.8.1.4 - even if the system is not currently affected.

The 7.7.1.8 and 7.8.1.4 PTFs are available from Fix Central.

If the tool reports that the system IS affected, urgently upgrade to fix the problem.

If the tool reports that the system is NOT currently affected, it is still possible for the system to become affected by APAR HU01706, if the three triggering conditions occur at the same time. Therefore, plan an upgrade in the near future to prevent the problem from occurring.

If it is not possible to upgrade the system in the near future, take the actions in step 3 below.

The detection tool can be used to monitor for the issue occurring. Run the script periodically (for example, every 15 minutes) by SSH from a script on a host. If the problem is detected, then step 2 above should be carried out as soon as possible, to clear the problem on the affected node.

Note that the tool is installed only on the config node. If the config node fails over (for example, due to a node warmstart), then the tool will not be present. In this case, re-install the tool on the new config node.

3. If upgrade to 7.7.1.8 or 7.8.1.4 is not possible in the near future, take steps to avoid further issues

These steps are only needed if the system cannot be upgraded.

If system IS NOT currently affected - monitor for future occurrences of the problem

Upgrade must be completed as soon as possible. Until then the detection tool can be used to monitor for an occurrence of APAR HU01706, to minimise the impact to the system. Run the script periodically (for example, once per hour) by SSH from a script on a host. If the problem is detected, then step 2 above should be carried out as soon as possible, to clear the problem on the affected node.

The return code from the script will be:
0 - if the test ran successfully, and the system is not currently affected by APAR HU01706
1 - if the test ran successfully, and the system IS currently affected by APAR HU01706
2 - if the test failed to run successfully

Note that the tool is installed only on the config node. If the config node fails over (for example, due to a node warmstart), then the tool will not be available. In this case, re-install the tool on the new config node.

If system IS currently affected - Resolve issue by removing/re-adding nodes to the cluster:

Upgrade must be completed as soon as possible, to prevent further occurrences. To temporarily remove the issue on each affected node, use the following procedure.
  1. Make sure that host multipathing is healthy, and lsdependentvdisks does not report dependent vdisks on the node(s) to be removed.
  2. Use the following CLI command to remove the first affected node:
    svctask rmnode <node_id>
    After about a minute, the node will have finished removing from the cluster.
  3. For SVC and V9000 clusters, the node will become a candidate automatically.
    Once lsnodecandidate lists the node, add the node back into the cluster using the following CLI command:
    svctask addnode -panelname <node_panelname> -iogrp <iogrp_id>

    For Storwize systems, the removed node will enter service state with node error 690. Log in to the node's service IP address, and issue the following command to exit service state and rejoin the cluster:
    satask stopservice
  4. Once lsnode shows all nodes online, move on to the next node (if more than one node was affected).
  5. Ensure a 30 minute delay between removal of nodes in an I/O group, to allow host multipathing failover.
  6. After completing the procedure, install the test utility and re-run it to confirm the issue has been resolved.

    -- end of procedure --

[{"Product":{"code":"STPVGU","label":"SAN Volume Controller"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"--","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.7.1;7.8.1","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Product":{"code":"STLM5A","label":"IBM Storwize V3700 (2072)"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":" ","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Product":{"code":"STHGUJ","label":"IBM Storwize V5000"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":" ","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Product":{"code":"ST3FR7","label":"IBM Storwize V7000"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":" ","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Product":{"code":"STKMQV","label":"IBM FlashSystem V9000"},"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Component":"Not Applicable","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
28 March 2023

UID

ssg1S1010888