Version 1.0.7.8 release notes

Cloud Pak for Data System version 1.0.7.8 is intended for NPS customers only. It removes the components which are going out of support soon, and are not needed for NPS to work, such as Red Hat OpenShift 3.11, Cloud Pak for Data 3.5, and Portworx. It also deactivates the existing VMs, and upgrades the firmware.

Upgrading

The upgrade procedure is performed by IBM Support.

Your system must be on version 1.0.7.6 or 1.0.7.7 to upgrade.

Do not upgrade to 1.0.7.8 if you are using Cloud Pak for Data. The upgrade path for system with Cloud Pak for Data is 1.0.7.6 > 2.0.x. For more information, see https://www.ibm.com/docs/en/cloud-paks/cloudpak-data-system/2.0?topic=system-advanced-upgrade-from-versions-10x.

As a result of this upgrade, Cloud Pak for Data and Portworx are removed and the virtual machines are deactivated. Follow the upgrade instructions in Upgrading to version 1.0.7.8.

Important: This release includes firmware upgrade which is required for the system to work as designed. Do not use the --skip-firmware option when running the apupgrade command.

What's new

With this upgrade all software components related to Cloud Pak for Data are disabled. This includes Red Hat OpenShift, Portworx, and Cloud Pak for Data itself. Some remnants of these components (VM and container images for example) might still be visible, but have been disabled in this release. Read the following sections to learn how that affects the system:
Red Hat OpenShift
Due to the discontinuance of full support for Red Hat OpenShift Container Platform 3.11 on June 30, 2022, it is removed from the system.
VM deactivation
Virtual machines that were running on the system are no longer available. They are deactivated, and you must not activate them again, as this will leave the system in an unstable state.
Cloud Pak for Data disabled
Cloud Pak for Data is disabled, and you can no longer use the application. The ap apps command lists Cloud Pak for Data as DISABLED. Note that the NPS web console is now represented by two services: CYCLOPS and INFLUXDB.
[root@e1n1 ~]# ap apps
+----------+------------------+
| Name     | Management State |
+----------+------------------+
| CallHome |          ENABLED |
| CYCLOPS  |          ENABLED |
| ICP4D    |         DISABLED |
| INFLUXDB |          ENABLED |
| VDB      |          ENABLED |
+----------+------------------+
System web console removal
System web console is no longer available in this release. You can use the following ap commands instead to manage and monitor hardware, software, or system alerts:
Hardware management
ap hw - to monitor the system hardware
ap locate - to locate specific hardware components
sys_hw_config bom - to modify the physical location details of hardware components.
Resource usage
You can monitor the system status with the following ap commands:
ap
Reports system state
ap info
Shows general information about the system, such as serial number, MTMs, and customer details
ap hw
Lists hardware inventory with status of each component
ap node
Lists, enables and disables nodes
ap sw
Lists software inventory with status of each component
ap issues
Lists all open alerts (issues) and unacknowledged events
ap fs
Lists all filesystems, mounts, NDSs and their states
Software overview
ap sw - to monitor software components
ap apps - to monitor the installed applications
Manage users
apusermgmt - for adding, modifying, deleting system users
ap_external_ldap - for external LDAP integration
Call Home / Notify IBM
callhome_config.py - for configuring Call Home
Notifications / Alert management
ap issues - for monitoring issues
ap events - for monitoring events
ap config - for configuring alert rules and the notifications, SMTP mail server, SNMP notifications
NPS web console
In 1.0.7.8, the NPS web console is no longer hosted in the OpenShift virtual machine. Instead, it is running on a bare-metal control node in a stand-alone container (or on a connector node if it is installed). The container images are installed on all control nodes for high availability. NPS web console is reinstalled during the upgrade, but you can also install it manually on a separate Linux machine, as described in https://www.ibm.com/docs/en/netezza?topic=iwc-installing-netezza-performance-server-web-console-cloud-pak-data-system-1078-outside-system
Storage changes
Portworx storage is no longer available.
Call Home changes
Version 1.0.7.8 no longer supports the system web console. Instead, alert management (re)configuration can be done via ap config commands, and the remaining Call Home configuration is done via the new callhome_config.py utility. For more information, see Monitoring with the Call Home feature.
User management
Users are no longer managed in the web console. You can use the apusermgmt command instead.
Hardware location management
Assigning hardware physical location is no longer done via the web console. Instead, you can use the sys_hw_config bom command to modify the hardware components details in the BOM file.
apdiag command for SFP issues
The following command can be used to collect all diagnostic information for network SFP issues:
apdiag collect --set sfp_issue --spus all
Network configuration improvements
  • The BGPcheck tool for verifying BGP setup in the network configuration yaml file.
  • The aposHouseConfig.py checks for the most recent YAML file in the directory and validates it, then it asks for confirmation to run the configuration with this file. For more information see Testing the YAML file and running playbooks
  • The new aposNetworkCheck.py tool for troubleshooting network issues.
Documentation
The Cloud Pak for Data System documentation is now divided into three collections, which you can select from the drop down list on the upper left corner:
  • 1.0 - for version 1.0.7.7 and lower
  • 1.0.7.x - for 1.0.7.8
  • 2.0 - for version 2.0.0 and higher

Firmware upgrade

The following firmware is upgraded:
  • XCC - 21B plus EAR (to include I2C bus / NVMe clocking fixes)
  • UEFI - 21B plus EAR (to include several security and Row Hammer memory management fixes from Intel)
  • SMM - 21b (no EAR) which corresponds to 1.21
  • Lenovo branded Kioxia NVMe drive firmware
  • Generic Kioxia NVMe drive firmware
  • Lenovo branded Samsung NVMe drive firmware
  • Generic Samsung NVMe drive firmware
  • Marvell Storage RAID Controller Firmware
  • Intel NVMe drive firmware
  • HPI code to support NVMe drive firmware update

Software components

Fixed issues

Known issues

Upgrade might fail if e1n1 is not the master node in Platform Manager
Before you start the upgrade ensure that e1n1 is the master node, as described in the upgrade procedure. Otherwise several nodes might become inaccessible or not responding in the expected timeframe after nodes reboot, and the upgrade fails with a similar error:
2021-03-09 21:40:05 INFO: Some nodes were not available.
                          [u'e15n1', u'e16n1', u'e1n4', u'e2n1', u'e2n2', u'e2n3', u'e2n4']
2021-03-09 21:40:05 ERROR: Error running command [systemctl restart network] on [u'e2n4-fab', u'e2n3-fab', u'e15n1-fab', u'e2n1-fab', u'e16n1-fab', u'e1n4-fab', u'e2n2-fab']
2021-03-09 21:40:05 ERROR: Unable to restart network services on [u'e15n1-fab', u'e16n1-fab', u'e1n4-fab', u'e2n1-fab', u'e2n2-fab', u'e2n3-fab', u'e2n4-fab']
2021-03-09 21:40:05 ERROR: ERROR: Error running command [systemctl restart network] on [u'e2n4-fab', u'e2n3-fab', u'e15n1-fab', u'e2n1-fab', u'e16n1-fab', u'e1n4-fab', u'e2n2-fab']
2021-03-09 21:40:05 ERROR: 
2021-03-09 21:40:05 ERROR: Unable to powercycle nodes via ipmitool.
2021-03-09 21:40:05 ERROR: 'bmc_addr'
2021-03-09 21:40:05 ERROR: The following nodes are still unavailable after a reboot attempt: [u'e15n1', u'e16n1', u'e1n4', u'e2n1', u'e2n2', u'e2n3', u'e2n4']
2021-03-09 21:40:05 FATAL ERROR: Problem rebooting nodes
To recover from the error:
  1. Check for the master node by verifying that the dhcpd service is running on the control nodes:
    systemctl is-active dhcpd
  2. If the dhcpd service is not running, on e1n1, run:
    systemctl start dhcpd
  3. Bring up the floating IPs:
    1. On e1n2 and e1n3, run:
      /opt/ibm/appliance/platform/management/actions/master_disable.py -scope master
    2. On e1n1, run:
      /opt/ibm/appliance/platform/management/actions/master_enable.py -scope master
  4. Rerun the upgrade where it failed.
Alerts SW Component is not ready opened after upgrade
If any of the following alerts related to OpenShift, Cloud Pak for Data, web console, or Portworx are opened after upgrading to 1.0.7.8, they must be closed manually with ap issues --close <alert_id>:
| 439         | SW_NEEDS_ATTENTION         | SW    | Openshift node is not ready                                   | YES      |
| 440         | SW_NEEDS_ATTENTION         | SW    | Openshift service is not ready                                | YES      |
| 446         | SW_NEEDS_ATTENTION         | SW    | ICP4D service is not ready                                    | YES      |
| 451         | SW_NEEDS_ATTENTION         | SW    | Webconsole service is not ready                               | YES      |
| 460         | SW_NEEDS_ATTENTION         | SW    | Portworx component is not healthy    
Connector node restrictions
With connector node set up, the following restriction is applicable for the system: the NPS host which is installed on the system with connector node must be accessed using name vdb1.fbond within that system. The vdb1.fbond is an internal name of NPS host. For example, the NPS web console must use vdb1.fbond as host name instead of application VIP (virtual IP address) or application host name while adding NPS instance in that system's web console.
Unable to start NPS after firmware upgrade due to GPFS token file missing
If NPS container does fails to start after firmware upgrade and the following error message is displayed:
1. HpiPostinstaller.postinstall
        Upgrade Detail: Firmware upgrade on nodes and switches
        Caller Info:The call was made from 'HpiPostinstaller.do_postinstall' on line 207 with file located at '/localrepo/1.0.7.8_release/EXTRACT/system/upgrade/bundle_upgraders/../hpi/hpi_postinstaller.py'
        Message: AbstractUpgrader.postinstaller:Node firmware upgrade failed
Unable to start IPS, System state is Stopped
  1. Verify if the GPFS token exists:
    ls /opt/ibm/appliance/storage/ips/ipshost1/nz/.gpfstoken
    If it exists, the issue requires further investigation. If the file does not exist:
  2. Create the token file with the following command and then restart the upgrade:
    touch /opt/ibm/appliance/storage/ips/ipshost1/nz/.gpfstoken
apupgrade failed to get NPS nodes info during pre-checks of HPI upgrade
The HPI upgrade might fail at prechecks reporting the following error:
1. check_nps_nodes
	Upgrade Detail: Pre-install Checks
	Caller Info:The call was made from 'HpiPrechecker.check_for_nps_nodes' on line 53 with file located at '/localrepo/1.0.7.8_mar9/EXTRACT/system/upgrade/bundle_upgraders/../hpi/hpi_prechecker.py'
	Message: hpi:HpiUpgrader.prechecker:Failed to get the NPS nodes on the system.

Workaround:

  1. Verify the list of the SPUs on the system by running:
    ap node
    [root@e1n1 hpi]# ap node
    +------------------+-----------+-------------+-----------+-----------+
    | Node             |     State | Personality | Monitored | Is Master |
    +------------------+-----------+-------------+-----------+-----------+
    | enclosure1.node1 |   ENABLED | CONTROL     |       YES |       YES |
    | enclosure1.node2 |   ENABLED | CONTROL     |       YES |        NO |
    | enclosure1.node3 |   ENABLED | CONTROL     |       YES |        NO |
    | enclosure1.node4 |   ENABLED | WORKER      |       YES |        NO |
    | enclosure2.node1 |   ENABLED | WORKER      |       YES |        NO |
    | enclosure2.node2 |   ENABLED | WORKER      |       YES |        NO |
    | enclosure2.node3 | UNMANAGED | VDB         |        NO |        NO |
    | enclosure2.node4 | UNMANAGED | VDB         |        NO |        NO |
    +------------------+-----------+-------------+-----------+-----------+
    
  2. If there are SPUs with personality only as VDB (enclosure2.node3 and enclosure2.node4 in the above example), set the personality to VDB[IPS1NODE] using the following command:
    ap node set_personality enclosure2.node3 VDB[IPS1NODE] --magneto_only
    ap node set_personality enclosure2.node4 VDB[IPS1NODE] --magneto_only
  3. Check personality with
    ap node
    [root@e1n1 hpi]# ap node
    +------------------+-----------+-------------+-----------+-----------+
    | Node             |     State | Personality | Monitored | Is Master |
    +------------------+-----------+-------------+-----------+-----------+
    | enclosure1.node1 |   ENABLED | CONTROL     |       YES |       YES |
    | enclosure1.node2 |   ENABLED | CONTROL     |       YES |        NO |
    | enclosure1.node3 |   ENABLED | CONTROL     |       YES |        NO |
    | enclosure1.node4 |   ENABLED | WORKER      |       YES |        NO |
    | enclosure2.node1 |   ENABLED | WORKER      |       YES |        NO |
    | enclosure2.node2 |   ENABLED | WORKER      |       YES |        NO |
    | enclosure2.node3 | UNMANAGED | VDB[IPS1NODE]       |        NO |        NO |
    | enclosure2.node4 | UNMANAGED | VDB[IPS1NODE]        |        NO |        NO |
    +------------------+-----------+-------------+-----------+-----------+
    
  4. Restart the upgrade.
Discovery might fail because of the hpi-software RPMs
When a system is upgraded from 1.0.7.7 to 1.0.7.8 and then discovery of the new 1.0.7.7 BB chassis is run, discovery fails with
Failed to send /install/bundle.copied_from_provisioning_system/upgrade/hpi/hpi-software-1.0.7.7-SNAPSHOT-icpds-release-1.0.7.7-noarch.rpm
to 9.0.0.197:/root/

Workaround:

To avoid this issue, unlink the 1.0.7.7 RPM before starting the discovery process:
unlink /install/bundle.copied_from_provisioning_system/upgrade/hpi/hpi-software-1.0.7.7-SNAPSHOT-icpds-release-1.0.7.7-noarch.rpm

[root@e1n1 ~]# find /install -name hpi-software*
/install/bundle.copied_from_provisioning_system/bundle/provisioning_dependencies/hpi/hpi-software-1.0.7.8-SNAPSHOT-icpds-release-1.0.7.8-noarch.rpm
Upgrade to 1.0.7.8 fails when security patch 7.9.22.03.SP11 or later was applied on the system
If you try to upgrade to 1.0.7.8 and the security patch 7.9.22.03.SP11 or greater was already applied on the system, the upgrade might fail with the following error:
1. NodeosUpgrader.install
	Upgrade Detail: Component install for nodeos
	Caller Info:The call was made from 'NodeOSYosemiteInstaller.install' on line 100 with file located at '/localrepo/<your-1.0.7.8-upgrade-dir>/EXTRACT/system/upgrade/bundle_upgraders/../nodeos/node_os_yosemite_installer.py'
	Message: nodeos:NodeosUpgrader.install:Fatal Problem: Failed to install any new Node OS rpms...

Workaround:

  1. Run the following two commands from e1n1:
    Note: Replace <your-1.0.7.8-upgrade-dir> directory name in the commands with the actual upgrade directory name that you used on your system.
    1. sed -i -e "0,/self._get_yum_repos_options() + ' ' + self.install_y/s/self._get_yum_repos_options() + ' ' + self.install_y/self._get_yum_repos_options() + ' ' + '--exclude=\"kernel*\"' + ' ' + self.install_y/" /localrepo/<your-1.0.7.8-upgrade-dir>/EXTRACT/system/upgrade/nodeos/node_os_installer.py
    2. sed -i -e 's,self.verify_bundle,#self.verify_bundle,g' /opt/ibm/appliance/apupgrade/bin/apupgrade
  2. Rerun the same apupgrade command that failed:
    apupgrade --upgrade --upgrade-directory /localrepo --bundle system --use-version <your-1.0.7.8-upgrade-dir>
aposYmlCheck.py returns false positive results for BGP setup
When verifying the network configuration YAML file with the aposYmlCheck.py script, the check ignores the missing /XX subnet mask in the IP address provided in the YAML file. Follow the documentation in Switch settings to avoid such errors in the network configuration file.