Version 1.0.8.4 release notes

Cloud Pak for Data System 1.0.8.4 is the next version after 1.0.8.3 and is intended for Netezza Performance Server customers only. 1.0.8.4 comes with upgrade time improvement and bug fixes.

Draft comment: arun.c.r@ibm.com
TBD
.

Note: Cloud Pak for Data System 1.0.8.4 has all the security updates up to SP24. Future security patches (SP25 and later) can be applied.
Draft comment: arun.c.r@ibm.com
SP information to be confirmed and updated.

Upgrading

The upgrade procedure is performed by IBM Support.

Do not upgrade to 1.0.8.4 if you are using Cloud Pak for Data. The upgrade path for system with Cloud Pak for Data is 1.0.7.3 > 2.0.x. For more information, see https://www.ibm.com/docs/en/cloud-paks/cloudpak-data-system/2.0?topic=system-advanced-upgrade-from-versions-10x.

Draft comment: arun.c.r@ibm.com
The information above needs to be verified.
Note: There is no need to upgrade to Cloud Pak for Data System 1.0.8.4 from version 1.0.8.3.

Software components

The recommended NPS version is 11.2.1.9 because it has base node expansion support, which needs 1.0.8.x release. The minimum supported NPS version is 11.2.1.5.

Draft comment: arun.c.r@ibm.com
NPS version to be confirmed.

Enhancements

  • Added the respective log path for each component in the upgrade tracelog to improve clarity on long-lasting stages of the upgrade and to improve the turn-around time if there are any failures.
  • Added a new pre-check to the upgrade process to verify whether NPS is up and running. Upgrade fails if NPS is down. To restart the upgrade, NPS must be online.
  • SCJ verification script is integrated to upgrade and executed during prechecks.
  • Both the upgrade console and the log now show the time that is required to complete upgrade for each component.
    Example:
    2023-10-26 04:00:59 Approx Estimated time required to upgrade psklm 00:05:00
    2023-10-26 04:01:15 Approx Estimated time required to upgrade nodeos 01:00:00
    2023-10-26 04:44:01 Approx Estimated time required to upgrade aposcomms 00:15:00
    2023-10-26 05:05:57 Approx Estimated time required to upgrade docker_upgrade 00:5:00
    2023-10-26 05:08:00 Approx Estimated time required to upgrade supporttools 00:05:00
    2023-10-26 05:08:13 Approx Estimated time required to upgrade platformbackups 00:01:00
    2023-10-26 05:08:24 Approx Estimated time required to upgrade magneto 00:20:00
    2023-10-26 05:08:51 Approx Estimated time required to upgrade containerapi 00:01:00
    2023-10-26 05:09:03 Approx Estimated time required to upgrade gpfsconfig 00:20:00
    2023-10-26 05:10:35 Approx Estimated time required to upgrade callhome 00:15:00
    2023-10-26 05:14:31 Approx Estimated time required to upgrade cyclops 00:15:00
    2023-10-26 05:16:23 Approx Estimated time required to upgrade appliancesoftwareversion 00:01:00
    2023-10-26 05:16:44 Approx Estimated time required to upgrade hpi 00:15:00
    2023-10-26 05:50:18 Approx Estimated time required to upgrade hpifirmware 03:50:00
    
    
  • Combined the SPU and non-SPU phases of the firmware upgrade process to improve upgrade speed.

Resolved issues

  • Resolved the issue that caused the alert Failed to collect status from resource manager after upgrade. The issue was due to the wrong path of the installed Python 3 packages. The installed Python 3 packages must be present under /usr/lib. If you try to install any packages that are already present under /usr/local/lib, the system uninstalls and reinstalls them in the right location.

Known issues

Exception while running --preliminary-check
There might be an exception while running --preliminary-check during the upgrade from 1.0.7.x versions to 1.0.8.4.
Please review the release notes for this version at https://ibm.biz/icpds_rn_1084 prior to running the upgrade.
Upgrade command:
apupgrade --upgrade --use-version 1.0.8.4_release --upgrade-directory /localrepo --bundle system
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 466, in get_distribution
    dist = get_provider(dist)
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 342, in get_provider
    return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 886, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 772, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'pipdeptree' distribution was not found and is required by the application
Workaround
The workaround is to proceed with running the upgrade command:
apupgrade --upgrade --upgrade-directory /localrepo --use-version 1.0.8.4_release --bundle system
Upgrade fails when GPFS disks are detected as down
Upgrade to version 1.0.8.4 might fail when GPFS disks are detected as down. Check the log for the following symptom with function check_gpfs_filesystems_for_sufficient_disks_up:
2023-10-31 22:29:30 TRACE: Running command [ssh e3n1 '/usr/lpp/mmfs/bin/mmlsdisk platform'].

                           LOGGING FROM: yosemite_bundleupgradechecker.py:check_gpfs_filesystems_for_sufficient_disks_up:428
2023-10-31 22:29:31 TRACE: RC: 0.
                           STDOUT: [disk         driver   sector     failure holds    holds                            storage
                           name         type       size       group metadata data  status        availability pool
                           ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------
                           platform_e3n1_ssd_0 nsd         512       1,0,0 Yes      Yes   ready         up           system
                           platform_e3n1_ssd_1 nsd         512       1,0,0 Yes      Yes   ready         up           system
                           platform_e3n1_ssd_2 nsd         512       1,0,0 Yes      Yes   ready         up           system
                           platform_e3n1_ssd_3 nsd         512       1,0,0 Yes      Yes   ready         up           system
                           platform_e1n1_ssd_0 nsd         512       2,0,1 Yes      Yes   ready         down         system
                           platform_e1n1_ssd_1 nsd         512       2,0,1 Yes      Yes   ready         down         system
                           platform_e1n1_ssd_2 nsd         512       2,0,1 Yes      Yes   ready         down         system
                           platform_e1n1_ssd_3 nsd         512       2,0,1 Yes      Yes   ready         down         system
                           platform_e2n1_ssd_0 nsd         512       3,0,2 Yes      Yes   ready         down         system
                           platform_e2n1_ssd_1 nsd         512       3,0,2 Yes      Yes   ready         down         system
                           platform_e2n1_ssd_2 nsd         512       3,0,2 Yes      Yes   ready         down         system
                           platform_e2n1_ssd_3 nsd         512       3,0,2 Yes      Yes   ready         down         system
                           ]
                           STDERR: []

                           LOGGING FROM: yosemite_bundleupgradechecker.py:check_gpfs_filesystems_for_sufficient_disks_up:428
Workaround
Run the following commands from e1n1 and restart the upgrade. Make sure that all the hosts are online.
  1. Identify all GPFS filesystems in the system. The filesystems are identified as /dev/<filesystem> in the output:
    mmlsfs all -T | grep 'File system'
  2. For each identified filesystems, verify that GPFS disks are not down and that all filesystems have four disks per host. If any of the hosts that do not have four disks per host, see issue 1 in the technote.
    mmlsdisk platform
    mmlsdisk ips
    mmlsdisk nrs
  3. If any of the GPFS disks are down, run the following commands to allow GPFS to rescan the disks. You can run the command only for nrs filesystem, if that appears in the list which is shown in step 1.
    mmchdisk platform start -a
    mmchdisk ips start -a
    mmchdisk ips nrs -a
  4. Run the following command to mount all GPFS filesystems on hosts:
    mmmount all -a
Upgrade might fail when the GPFS component upgrades
Upgrade to version 1.0.8.4 from 1.0.7.8 on systems with connector nodes might fail when the GPFS component upgrades.
Example of tracelog:
2023-11-29 18:25:04 TRACE: run_shell_cmd_in_parallel(): running cmd systemctl restart mmsdrserv on nodes ['e5n1', 'e1n3', 'e4n1', 'e1n2', 'e1n1']
                           LOGGING FROM: yosemite_bundleupgradechecker.py:ensure_shared_storage_is_up:446
2023-11-29 18:35:09 TRACE:
                           ['e5n1', 'e4n1']
                           RC: 1
                           STDOUT: []
                           STDERR: [A dependency job for mmsdrserv.service failed. See 'journalctl -xe' for details.
                           ]

                           ['e1n3', 'e1n2']
                           RC: 0
                           STDOUT: []
                           STDERR: []

                           ['e1n1']
                           RC: 1
                           STDOUT: []
                           STDERR: [Job for mmsdrserv.service failed because the control process exited with error code. See "systemctl status mmsdrserv.service" and "journalctl -xe" for details.
                           ]

                           LOGGING FROM: yosemite_bundleupgradechecker.py:ensure_shared_storage_is_up:446
2023-11-29 18:35:09 TRACE: run_shell_cmd_in_parallel_or_raise(): running cmd service gpfs start on nodes ['e5n1', 'e1n3', 'e4n1', 'e1n2', 'e1n1']
                           LOGGING FROM: yosemite_bundleupgradechecker.py:ensure_shared_storage_is_up:446
2023-11-29 18:45:14 TRACE:
                           ['e5n1', 'e4n1']
                           RC: 1
                           STDOUT: []
                           STDERR: [Redirecting to /bin/systemctl start gpfs.service
                           A dependency job for gpfs.service failed. See 'journalctl -xe' for details.
                           ]

                           ['e1n3', 'e1n2', 'e1n1']
                           RC: 0
                           STDOUT: []
                           STDERR: [Redirecting to /bin/systemctl start gpfs.service
                           ]

                           LOGGING FROM: yosemite_bundleupgradechecker.py:ensure_shared_storage_is_up:446
2023-11-29 18:45:14 ERROR: Error running command [service gpfs start] on ['e5n1', 'e4n1']
                           LOGGING FROM: yosemite_bundleupgradechecker.py:ensure_shared_storage_is_up:446
2023-11-29 18:45:14 INFO: You can view mmfs.log.latest log file at /var/adm/ras/ for details
                           LOGGING FROM: yosemite_bundleupgradechecker.py:ensure_shared_storage_is_up:446
2023-11-29 18:45:14 FATAL ERROR: Prerequisite system checks failed
                           LOGGING FROM: bundleupgradechecker.py:perform_bundle_level_checks:223
2023-11-29 18:45:14 FATAL ERROR: More Info: See trace messages at /var/log/appliance/apupgrade/20231129/apupgrade20231129175951.log.tracelog for additional troubleshooting information.
Workaround
  1. Run mmgetstate -aLv to verify that the GPFS status is active.
    mmgetstate -aLv
    Expected output:
    [root@gt08-node1 ~]# mmgetstate -aLv
    
     Node number  Node name  Quorum  Nodes up  Total nodes  GPFS state    Remarks
    ---------------------------------------------------------------------------------
               1  e1n1          2         3          5      active        quorum node
               2  e1n2          2         3          5      active        quorum node
               3  e1n3          2         3          5      active        quorum node
               4  e4n1          2         3          5      active
               5  e5n1          2         3          5      active
  2. Restart the upgrade.
Upgrade might fail due to missing wheel (Python 3) package
For the systems with security patch 7.9.23.02.SP20 applied, upgrading to version 1.0.8 and later might fail due to missing wheel (Python 3) package.
Example of tracelog:
2024-02-10 09:58:54 INFO: nodeos:Starting Node OS component post-install steps...
                           LOGGING FROM: NodeosUpgrader.py:postinstall:166
2024-02-10 09:58:54 INFO: NodeosUpgrader.postinstall:Running nodeOS postinstall script for post upgrade configuration
                           LOGGING FROM: node_os_yosemite_postinstaller.py:run_nodeos_postinstall_script:30
2024-02-10 09:58:54 TRACE: Running command [/opt/ibm/appliance/platform/xcat/scripts/xcat/nodeos_post_actions.py].

                           LOGGING FROM: node_os_yosemite_postinstaller.py:run_nodeos_postinstall_script:31
2024-02-10 10:01:31 TRACE: RC: 1.
                           STDOUT: []
                           STDERR: [ERROR: Command ['pip3', 'install', '/tmp/python3_packages/wheel-*.whl', '-f', '/tmp/python3_packages/', '--no-index', '--prefix', '/usr'] failed with error:
                           WARNING: Requirement '/tmp/python3_packages/wheel-*.whl' looks like a filename, but the file does not exist
                           ERROR: wheel-*.whl is not a valid wheel filename.
                            More Info: See /var/log/appliance/platform/xcat/nodeos_post_action.log for details.
                           ]

                           LOGGING FROM: node_os_yosemite_postinstaller.py:run_nodeos_postinstall_script:31
2024-02-10 10:01:31 ERROR: NodeosUpgrader.postinstall:Issue encountered while running nodeOS postinstall script for configuration.
                           LOGGING FROM: node_os_yosemite_postinstaller.py:run_nodeos_postinstall_script:33
2024-02-10 10:01:31 ERROR:
                           LOGGING FROM: node_os_yosemite_postinstaller.py:run_nodeos_postinstall_script:33
2024-02-10 10:01:31 ERROR: ERROR: Command ['pip3', 'install', '/tmp/python3_packages/wheel-*.whl', '-f', '/tmp/python3_packages/', '--no-index', '--prefix', '/usr'] failed with error:
                           LOGGING FROM: node_os_yosemite_postinstaller.py:run_nodeos_postinstall_script:33
2024-02-10 10:01:31 ERROR: WARNING: Requirement '/tmp/python3_packages/wheel-*.whl' looks like a filename, but the file does not exist
                           LOGGING FROM: node_os_yosemite_postinstaller.py:run_nodeos_postinstall_script:33
2024-02-10 10:01:31 ERROR: ERROR: wheel-*.whl is not a valid wheel filename.
                           LOGGING FROM: node_os_yosemite_postinstaller.py:run_nodeos_postinstall_script:33
2024-02-10 10:01:31 ERROR:  More Info: See /var/log/appliance/platform/xcat/nodeos_post_action.log for details.
                           LOGGING FROM: node_os_yosemite_postinstaller.py:run_nodeos_postinstall_script:33
2024-02-10 10:01:31 ERROR:
                           LOGGING FROM: node_os_yosemite_postinstaller.py:run_nodeos_postinstall_script:33
2024-02-10 10:01:31 TRACE: In method logger.py:log_error:142 from parent method node_os_yosemite_postinstaller.py:run_nodeos_postinstall_script:33 with args
                               msg = NodeosUpgrader.postinstall:Issue encountered while running nodeOS postinstall script for configuration.

                           ERROR: Command ['pip3', 'install', '/tmp/python3_packages/wheel-*.whl', '-f', '/tmp/python3_packages/', '--no-index', '--prefix', '/usr'] failed with error:
                           WARNING: Requirement '/tmp/python3_packages/wheel-*.whl' looks like a filename, but the file does not exist
                           ERROR: wheel-*.whl is not a valid wheel filename.
                            More Info: See /var/log/appliance/platform/xcat/nodeos_post_action.log for details.
Workaround
  1. Download the release bundle:
    cp -r /localrepo/1.0.8.x_release/EXTRACT/system/bundle/app_img/python3_dependencies /install/app_img/
  2. Copy the Python 3 dependencies from the bundle to /install/app_img/ on node e1n1.
  3. Restart the upgrade.