Upgrading to version 1.0.8.2

Upgrade to version 1.0.8.2 is performed by IBM Support.

Approximate upgrade time:

Lenovo systems: 7 hours 30 minutes
Dell Systems: 4 Hours 30 minutes

Before you begin

Upgrade prerequisites:

If you are running Cloud Pak for Data System 1.0.8 version, before you start upgrading to 1.0.8.2, you must apply 1.0.8.0 Interim Fix 2. For more information, see 1.0.8.0 Interim Fix 2 release notes.

Network setup prerequisites:

Before you start the upgrade, from /opt/ibm/appliance/platform/apos-comms/customer_network_config/ansible directory, you must run:
```
ANSIBLE_HASH_BEHAVIOUR=merge ansible-playbook -i ./System_Name.yml playbooks/house_config.yml --check -v
```
If any changes are listed in --check --v, ensure that they are expected. If they are unexpected, you must edit the YAML file so that it contains only the expected changes. You might rerun this command as necessary until you see no errors.

Netezza prerequisites:

Recommended to upgrade to Netezza Performance Server 11.2.1.9.

About this task

Upgrade to 1.0.8.2 is supported for systems on 1.0.7.3 and above. Only the system bundle is upgraded in 1.0.8.2. There is no need to download the following packages:

icpds_vm
icpds_rhos_repo
icpds_services
icpds_services_addon_cyclops

Procedure

Connect to the node1 management interface by using the custom_hostname or ip values from your System_Name.yml
Verify that e1n1 is the hub:
1. Check for the hub node by verifying that the dhcpd service is running:
```
systemctl is-active dhcpd
```
2. If the dhcpd service is running on a node other than e1n1, bring the service down on that other node:
```
systemctl stop dhcpd
```
3. On e1n1, run:
```
systemctl start dhcpd
```
Download the system bundle from Fix Central and copy it to /localrepo on e1n1.

Note: The upgrade bundle requires a significant amount of free space. Make sure you delete all bundle files from previous releases.
From the /localrepo directory on e1n1, run:
```
mkdir 1.0.8.2_release
```
and move the system bundle into that directory. The directory that is used here must be uniquely named - for example, no previous upgrades on the system can have been run out of a directory with the same name.
Optional: Run the following command:
```
apupgrade --upgrade-apupgrade --upgrade-directory /localrepo --use-version 1.0.8.2_release --bundle system
```
The value for the --use-version parameter is the same as the name of the directory you created in step 4.

Optional: Run upgrade details to view details about the specific upgrade version:

apupgrade --upgrade-details --upgrade-directory /localrepo --use-version 1.0.8.2_release --bundle system

Before you start the upgrade process, depending on your requirements:
- Run the preliminary checks with --preliminary-check option:
```
apupgrade --preliminary-check --upgrade-directory /localrepo --use-version 1.0.8.2_release --bundle system
```
  if you just want to check for potential issues and cannot accept any system disruptions. This check is non-invasive and you can rerun it as necessary. You can expect the following output after you run the preliminary checks.
```
All preliminary checks complete
Finished running pre-checks.
```
- Optional: Run the preliminary checks with --preliminary-check-with-fixes option:
```
apupgrade --preliminary-check-with-fixes --upgrade-directory /localrepo --use-version 1.0.8.2_release --bundle system
```
  if you want to check for potential issues and attempt to automatically fix those. Run it if you can accept your system to be disrupted as this command might cause the nodes to reboot.

Run:

apupgrade --upgrade --upgrade-directory /localrepo --use-version 1.0.8.2_release --bundle system

Results

After the upgrade is complete, some of the following alerts might be opened in the system:

| 439         | SW_NEEDS_ATTENTION         | SW    | Openshift node is not ready                                   | YES      |
| 440         | SW_NEEDS_ATTENTION         | SW    | Openshift service is not ready                                | YES      |
| 446         | SW_NEEDS_ATTENTION         | SW    | ICP4D service is not ready                                    | YES      |
| 451         | SW_NEEDS_ATTENTION         | SW    | Webconsole service is not ready                               | YES      |
| 460         | SW_NEEDS_ATTENTION         | SW    | Portworx component is not healthy

Close them manually with the following command:

ap issues --close <alert_id>

As part of the upgrade process, VMs are disabled on all nodes and they are shut down. They are expected to stay in the shut off state in 1.0.8.2.

[root@gt01-node1 ~]# for node in `/opt/ibm/appliance/platform/xcat/scripts/xcat/display_nodes.py`; do echo ${node}; ssh $node virsh list --all; done
e1n1
 Id    Name                           State
----------------------------------------------------
 -     e1n1-1-control                 shut off

e1n2
 Id    Name                           State
----------------------------------------------------
 -     e1n2-1-control                 shut off

e1n3
 Id    Name                           State
----------------------------------------------------
 -     e1n3-1-control                 shut off

e1n4
 Id    Name                           State
----------------------------------------------------
 -     e1n4-1-worker                  shut off

e2n1
e2n2
e2n3
e2n4

Draft comment: arun.c.r@ibm.com
Confirm the above results

In 1.0.8.2, the Netezza web console runs on one of the three control nodes (or on a connector node if installed). There are two docker containers required for operation of the Netezza console: cyclops and the associated influxdb container. Container images are installed on all control nodes for high availability. When a control node goes out of service, Platform Manager starts the cyclops and influxdb containers on another control node (or a connector node).

[root@gt18-node1 ~]# for node in `/opt/ibm/appliance/platform/xcat/scripts/xcat/display_nodes.py --control`; do echo ${node}; ssh $node docker ps -a | grep -E 'cyclops|influxdb'; done
e1n1
c7d402b47de8        cyclops:4.0.2-20221114b30631-x86_64   "/scripts/start.sh"      4 days ago          Exited (255) 30 hours ago                            cyclops
9f960b843510        influxdb:latest                       "/entrypoint.sh in..."   4 days ago          Exited (255) 30 hours ago   0.0.0.0:8086->8086/tcp   influxdb
e1n2
642f4b0b5087        cyclops:4.0.2-20221114b30631-x86_64   "/scripts/start.sh"      4 days ago          Up 30 hours         80/tcp, 3000/tcp, 5480/tcp, 0.0.0.0:3333->3333/tcp, 0.0.0.0:8843->8443/tcp   cyclops
177a97aaa701        influxdb:latest                       "/entrypoint.sh in..."   4 days ago          Up 30 hours         0.0.0.0:8086->8086/tcp                                                       influxdb
e1n3
d590a49d369f        cyclops:4.0.2-20221114b30631-x86_64   "/scripts/start.sh"      4 days ago          Exited (137) 4 days ago                       cyclops
19e5f305548e        influxdb:latest                       "/entrypoint.sh in..."   4 days ago          Exited (0) 4 days ago                         influxdb
[root@gt18-node1 ~]#

The ap version -s container reflects the web console version:

[root@gt18-node1 ~]# ap version -s
Appliance software version is 1.0.8.2
All component versions are synchronized.
+-----------------------------+-----------------------------------------------------------------+
| Component Name              | Version                                                         |
+-----------------------------+-----------------------------------------------------------------+
| Appliance platform software | 1.0.8.2-20230703071302b4911                                     |
| aposcomms                   | ibm-apos-network-config             : 7.3.0-1                   |
|                             | ibm-apos-named-config               : 3.5.0-1                   |
|                             | ibm-apos-common                     : 11.3.0-1                  |
|                             | ibm-apos-network-tools              : 27.7.3-1                  |
|                             | ibm-apos-chrony-config              : 5.0.1-1                   |
|                             | ibm-apos-udev-rules-config          : 3.1.1-1                   |
|                             | ibm-apos-dhcpd-config               : 5.2.0-1                   |
| apupgrade                   | 1.0.8.2-20230630103432b4881                                     |
| callhome                    | 1.1.28.0-20230428144243b2                                       |
| containerapi                | 1.0.23.0-20230428140359b3098                                    |
| cyclops                     | 4.0.2-20230428b3082                                             |
| docker-upgrade              | oci-umount                          : 2.5-3                     |
|                             | oci-register-machine                : 0-6                       |
|                             | oci-systemd-hook                    : 0.2.0-1                   |
|                             | atomic-registries                   : 1.22.1-29                 |
|                             | docker                              : 1.13.1-161                |
|                             | docker-rhel-push-plugin             : 1.13.1-161                |
|                             | docker-client                       : 1.13.1-161                |
|                             | docker-debuginfo                    : 1.13.1-161                |
|                             | docker-common                       : 1.13.1-161                |
|                             | container-selinux                   : 2.119.2-1.911c772         |
|                             | container-storage-setup             : 0.11.0-2                  |
|                             | containers-common                   : 0.1.40-11                 |
|                             | python-pytoml                       : 0.1.14-1                  |
| gpfs                        | 5.1.2-7                                                         |
| gpfsconfig                  | 1.0.8.2-20230624005807b4718                                     |
| hpi                         | hpiutils                            : 2.0.4.4-20230428170457b1  |
|                             | hpi-cumulus-fabsw-firmware          : 2.0.0.1-20230519105436    |
|                             | hpi-dell-node-firmware              : 1.8.0.1-20230519105436    |
|                             | hpi-cumulus-mgtsw-firmware          : 2.0.0.1-20230519105436    |
|                             | hpi-software                        : 1.0.8.2-20230630172951b22 |
|                             | hpi-lenovo-node-firmware            : 1.8.0.1-20230519105436    |
|                             | hpi-cumulus-fabspine-firmware       : 2.0.0.1-20230519105436    |
|                             | hpi-cumulus-switch-firmware         : 2.0.0.1-20230519105436    |
|                             | hpi-x86_64-image                    : 2.0.4.5-20230428210653b1  |
|                             | hpicfg                              : 2.0.4.4-20230428170438b1  |
|                             | dct                                 : 1.0.7.8-20230429004139b1  |
| magneto                     | 1.0.28.3-20230628151826b4806                                    |
| mellanox                    | 1.0.8.0                                                         |
| mvcli                       | 2.3.10.1095                                                     |
| nodeos                      | 1.0.8.2-20230623235252b4718                                     |
| platformbackups             | 1.0.20.0-20230428140353b3099                                    |
| psklm                       | 1.0.22.0-20230703100005b15                                      |
| solarflare                  | 4.15.10.1002                                                    |
| supporttools                | 1.0.23.11-20230428170633b3089                                   |
+-----------------------------+-----------------------------------------------------------------+

Firmware post-upgrade steps

After you finish your 1.0.8.2 upgrade process, you must ensure that your firmware is also upgraded.

Procedure

Run:
```
/opt/ibm/appliance/platform/hpi/sys_hw_check node
```
to verify the output. If it is showing PASS or ABOVE for all nodes, no action is required.
If any of the nodes are showing BELOW, check the report log file to determine whether the nodes firmware must be upgraded or BMC settings updated.

If any of the nodes report BELOW for firmware, run:

/opt/ibm/appliance/platform/hpi/sys_hw_config -f -t target nodes

For example:

/opt/ibm/appliance/platform/hpi/sys_hw_config -f -t e1n{1..4}

If any of the nodes only report incorrect BMC settings, run:

/opt/ibm/appliance/platform/hpi/sys_hw_config -t target nodes

For example:

/opt/ibm/appliance/platform/hpi/sys_hw_config -t e1n{1..4}

After you complete sys_hw_config and the nodes rebooted, run:
```
/opt/ibm/appliance/platform/hpi/sys_hw_check node
```
to confirm the updates took effect and the output is showing only PASS or ABOVE for all nodes.

Nodes personalities check post-upgrade steps

1.0.8.2 upgrade on Dell systems does not remove WORKER personality. You must check the nodes to ensure that there are no WORKER personalities after the upgrade completes. This issue happens when the worker vm is gone from e4n1, but the node still has the WORKER personality set.

About this task

If you run virsh list --all and see the following output:

[root@e4n1 ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
[root@nz5-node1 ~]#  ap node -d
+------------------+---------+---------------+-----------+-----------+--------+---------------+---------------+
| Node             |   State | Personality   | Monitored | Is Master | Is HUB | Is VDB Master | Is NRS Master |
+------------------+---------+---------------+-----------+-----------+--------+---------------+---------------+
| enclosure1.node1 | ENABLED | CONTROL,UNSET |       YES |       YES |    YES |            NO |            NO |
| enclosure2.node1 | ENABLED | CONTROL,UNSET |       YES |        NO |     NO |            NO |            NO |
| enclosure3.node1 | ENABLED | CONTROL,UNSET |       YES |        NO |     NO |            NO |            NO |
| enclosure4.node1 | ENABLED | WORKER,UNSET  |       YES |        NO |     NO |            NO |            NO |
+------------------+---------+---------------+-----------+-----------+--------+---------------+---------------+

Or the following error in the tracelog:

2022-12-09 14:58:30 INFO: Checking for UNSET node(s)
                           LOGGING FROM: yosemite_bundle_upgrade.py:get_unset_worker_node_names:544
2022-12-09 14:58:30 TRACE: Running command [ap node | grep UNSET | cut -f2 -d '|'].

                           LOGGING FROM: yosemite_bundle_upgrade.py:get_unset_worker_node_names:544
2022-12-09 14:58:30 TRACE: RC: 0.
                           STDOUT: [ enclosure1.node1
                            enclosure2.node1
                            enclosure3.node1
                            enclosure4.node1
                           ]
                           STDERR: []

                           LOGGING FROM: yosemite_bundle_upgrade.py:get_unset_worker_node_names:544
2022-12-09 14:58:30 TRACE: Running command [ap node set_personality  UNSET --magneto_only -f].

                           LOGGING FROM: yosemite_bundle_upgrade.py:unset_worker_node_personalities:530
2022-12-09 14:58:31 TRACE: RC: 1.
                           STDOUT: [
                           Generated: 2022-12-09 14:58:31

                           ]
                           STDERR: ['UNSET' is not a valid node location
                           ]

                           LOGGING FROM: yosemite_bundle_upgrade.py:unset_worker_node_personalities:530
2022-12-09 14:58:31 ERROR: Error running command [ap node set_personality  UNSET --magneto_only -f].

You must apply the following workaround.

Set the node personalities after 1.0.8.2 upgrade is complete. Depending on the existing node personality, run:

For CONTROL,WORKER run:

ap node set_personality <node> CONTROL,UNSET --magneto_only -f

For WORKER,WORKER run:

ap node set_personality <node> UNSET,UNSET --magneto_only -f

For WORKER,UNSET run:

ap node set_personality <node> UNSET,UNSET --magneto_only -f

For example:

[root@gt25-node1 upgrade]# ap node
+------------------+---------+----------------+-----------+-----------+
| Node             |   State | Personality    | Monitored | Is Master |
+------------------+---------+----------------+-----------+-----------+
| enclosure1.node1 | ENABLED | CONTROL,WORKER |       YES |       YES |
| enclosure2.node1 | ENABLED | CONTROL,WORKER |       YES |        NO |
| enclosure3.node1 | ENABLED | CONTROL,WORKER |       YES |        NO |
| enclosure4.node1 | ENABLED | UNSET,UNSET    |       YES |        NO |
| enclosure5.node1 | ENABLED | UNSET,UNSET    |       YES |        NO |
| enclosure6.node1 | ENABLED | UNSET,UNSET    |       YES |        NO |
+------------------+---------+----------------+-----------+-----------+


[root@gt25-node1 upgrade]# ap node set_personality  enclosure1.node1  CONTROL,UNSET --magneto_only -f
Node role change request sent successfully

Generated: 2022-12-13 11:08:07

[root@gt25-node1 upgrade]# ap node
+------------------+---------+----------------+-----------+-----------+
| Node             |   State | Personality    | Monitored | Is Master |
+------------------+---------+----------------+-----------+-----------+
| enclosure1.node1 | ENABLED | CONTROL,UNSET  |       YES |       YES |
| enclosure2.node1 | ENABLED | CONTROL,WORKER |       YES |        NO |
| enclosure3.node1 | ENABLED | CONTROL,WORKER |       YES |        NO |
| enclosure4.node1 | ENABLED | UNSET,UNSET    |       YES |        NO |
| enclosure5.node1 | ENABLED | UNSET,UNSET    |       YES |        NO |
| enclosure6.node1 | ENABLED | UNSET,UNSET    |       YES |        NO |
+------------------+---------+----------------+-----------+-----------+