Upgrading to version 1.0.8.2
Upgrade to version 1.0.8.2 is performed by IBM Support.
Approximate upgrade time:
- Lenovo systems: 7 hours 30 minutes
- Dell Systems: 4 Hours 30 minutes
Before you begin
Upgrade prerequisites:
- If you are running Cloud Pak for Data System 1.0.8 version, before you start upgrading to 1.0.8.2, you must apply 1.0.8.0 Interim Fix 2. For more information, see 1.0.8.0 Interim Fix 2 release notes.
Network setup prerequisites:
- Before you start the upgrade, from
/opt/ibm/appliance/platform/apos-comms/customer_network_config/ansible
directory, you must
run:
If any changes are listed inANSIBLE_HASH_BEHAVIOUR=merge ansible-playbook -i ./System_Name.yml playbooks/house_config.yml --check -v
--check --v
, ensure that they are expected. If they are unexpected, you must edit the YAML file so that it contains only the expected changes. You might rerun this command as necessary until you see no errors.
Netezza prerequisites:
- Recommended to upgrade to Netezza Performance Server 11.2.1.9.
About this task
icpds_vm
icpds_rhos_repo
icpds_services
icpds_services_addon_cyclops
Procedure
Results
| 439 | SW_NEEDS_ATTENTION | SW | Openshift node is not ready | YES |
| 440 | SW_NEEDS_ATTENTION | SW | Openshift service is not ready | YES |
| 446 | SW_NEEDS_ATTENTION | SW | ICP4D service is not ready | YES |
| 451 | SW_NEEDS_ATTENTION | SW | Webconsole service is not ready | YES |
| 460 | SW_NEEDS_ATTENTION | SW | Portworx component is not healthy
Close
them manually with the following command:ap issues --close <alert_id>
As
part of the upgrade process, VMs are disabled on all nodes and they are shut down. They are expected
to stay in the shut off
state in
1.0.8.2.[root@gt01-node1 ~]# for node in `/opt/ibm/appliance/platform/xcat/scripts/xcat/display_nodes.py`; do echo ${node}; ssh $node virsh list --all; done
e1n1
Id Name State
----------------------------------------------------
- e1n1-1-control shut off
e1n2
Id Name State
----------------------------------------------------
- e1n2-1-control shut off
e1n3
Id Name State
----------------------------------------------------
- e1n3-1-control shut off
e1n4
Id Name State
----------------------------------------------------
- e1n4-1-worker shut off
e2n1
e2n2
e2n3
e2n4
cyclops
and the associated influxdb
container. Container images
are installed on all control nodes for high availability. When a control node goes out of service,
Platform Manager starts the cyclops
and influxdb
containers on
another control node (or a connector
node).[root@gt18-node1 ~]# for node in `/opt/ibm/appliance/platform/xcat/scripts/xcat/display_nodes.py --control`; do echo ${node}; ssh $node docker ps -a | grep -E 'cyclops|influxdb'; done
e1n1
c7d402b47de8 cyclops:4.0.2-20221114b30631-x86_64 "/scripts/start.sh" 4 days ago Exited (255) 30 hours ago cyclops
9f960b843510 influxdb:latest "/entrypoint.sh in..." 4 days ago Exited (255) 30 hours ago 0.0.0.0:8086->8086/tcp influxdb
e1n2
642f4b0b5087 cyclops:4.0.2-20221114b30631-x86_64 "/scripts/start.sh" 4 days ago Up 30 hours 80/tcp, 3000/tcp, 5480/tcp, 0.0.0.0:3333->3333/tcp, 0.0.0.0:8843->8443/tcp cyclops
177a97aaa701 influxdb:latest "/entrypoint.sh in..." 4 days ago Up 30 hours 0.0.0.0:8086->8086/tcp influxdb
e1n3
d590a49d369f cyclops:4.0.2-20221114b30631-x86_64 "/scripts/start.sh" 4 days ago Exited (137) 4 days ago cyclops
19e5f305548e influxdb:latest "/entrypoint.sh in..." 4 days ago Exited (0) 4 days ago influxdb
[root@gt18-node1 ~]#
The ap version -s
container reflects the web console
version:[root@gt18-node1 ~]# ap version -s
Appliance software version is 1.0.8.2
All component versions are synchronized.
+-----------------------------+-----------------------------------------------------------------+
| Component Name | Version |
+-----------------------------+-----------------------------------------------------------------+
| Appliance platform software | 1.0.8.2-20230703071302b4911 |
| aposcomms | ibm-apos-network-config : 7.3.0-1 |
| | ibm-apos-named-config : 3.5.0-1 |
| | ibm-apos-common : 11.3.0-1 |
| | ibm-apos-network-tools : 27.7.3-1 |
| | ibm-apos-chrony-config : 5.0.1-1 |
| | ibm-apos-udev-rules-config : 3.1.1-1 |
| | ibm-apos-dhcpd-config : 5.2.0-1 |
| apupgrade | 1.0.8.2-20230630103432b4881 |
| callhome | 1.1.28.0-20230428144243b2 |
| containerapi | 1.0.23.0-20230428140359b3098 |
| cyclops | 4.0.2-20230428b3082 |
| docker-upgrade | oci-umount : 2.5-3 |
| | oci-register-machine : 0-6 |
| | oci-systemd-hook : 0.2.0-1 |
| | atomic-registries : 1.22.1-29 |
| | docker : 1.13.1-161 |
| | docker-rhel-push-plugin : 1.13.1-161 |
| | docker-client : 1.13.1-161 |
| | docker-debuginfo : 1.13.1-161 |
| | docker-common : 1.13.1-161 |
| | container-selinux : 2.119.2-1.911c772 |
| | container-storage-setup : 0.11.0-2 |
| | containers-common : 0.1.40-11 |
| | python-pytoml : 0.1.14-1 |
| gpfs | 5.1.2-7 |
| gpfsconfig | 1.0.8.2-20230624005807b4718 |
| hpi | hpiutils : 2.0.4.4-20230428170457b1 |
| | hpi-cumulus-fabsw-firmware : 2.0.0.1-20230519105436 |
| | hpi-dell-node-firmware : 1.8.0.1-20230519105436 |
| | hpi-cumulus-mgtsw-firmware : 2.0.0.1-20230519105436 |
| | hpi-software : 1.0.8.2-20230630172951b22 |
| | hpi-lenovo-node-firmware : 1.8.0.1-20230519105436 |
| | hpi-cumulus-fabspine-firmware : 2.0.0.1-20230519105436 |
| | hpi-cumulus-switch-firmware : 2.0.0.1-20230519105436 |
| | hpi-x86_64-image : 2.0.4.5-20230428210653b1 |
| | hpicfg : 2.0.4.4-20230428170438b1 |
| | dct : 1.0.7.8-20230429004139b1 |
| magneto | 1.0.28.3-20230628151826b4806 |
| mellanox | 1.0.8.0 |
| mvcli | 2.3.10.1095 |
| nodeos | 1.0.8.2-20230623235252b4718 |
| platformbackups | 1.0.20.0-20230428140353b3099 |
| psklm | 1.0.22.0-20230703100005b15 |
| solarflare | 4.15.10.1002 |
| supporttools | 1.0.23.11-20230428170633b3089 |
+-----------------------------+-----------------------------------------------------------------+
Firmware post-upgrade steps
After you finish your 1.0.8.2 upgrade process, you must ensure that your firmware is also upgraded.
Procedure
Nodes personalities check post-upgrade steps
1.0.8.2 upgrade on Dell systems does not remove WORKER
personality. You
must check the nodes to ensure that there are no WORKER
personalities after the
upgrade completes. This issue happens when the worker vm
is gone from
e4n1
, but the node still has the WORKER
personality
set.
About this task
If you run virsh list --all and see the following output:
[root@e4n1 ~]# virsh list --all
Id Name State
----------------------------------------------------
[root@nz5-node1 ~]# ap node -d
+------------------+---------+---------------+-----------+-----------+--------+---------------+---------------+
| Node | State | Personality | Monitored | Is Master | Is HUB | Is VDB Master | Is NRS Master |
+------------------+---------+---------------+-----------+-----------+--------+---------------+---------------+
| enclosure1.node1 | ENABLED | CONTROL,UNSET | YES | YES | YES | NO | NO |
| enclosure2.node1 | ENABLED | CONTROL,UNSET | YES | NO | NO | NO | NO |
| enclosure3.node1 | ENABLED | CONTROL,UNSET | YES | NO | NO | NO | NO |
| enclosure4.node1 | ENABLED | WORKER,UNSET | YES | NO | NO | NO | NO |
+------------------+---------+---------------+-----------+-----------+--------+---------------+---------------+
Or
the following error in the tracelog:2022-12-09 14:58:30 INFO: Checking for UNSET node(s)
LOGGING FROM: yosemite_bundle_upgrade.py:get_unset_worker_node_names:544
2022-12-09 14:58:30 TRACE: Running command [ap node | grep UNSET | cut -f2 -d '|'].
LOGGING FROM: yosemite_bundle_upgrade.py:get_unset_worker_node_names:544
2022-12-09 14:58:30 TRACE: RC: 0.
STDOUT: [ enclosure1.node1
enclosure2.node1
enclosure3.node1
enclosure4.node1
]
STDERR: []
LOGGING FROM: yosemite_bundle_upgrade.py:get_unset_worker_node_names:544
2022-12-09 14:58:30 TRACE: Running command [ap node set_personality UNSET --magneto_only -f].
LOGGING FROM: yosemite_bundle_upgrade.py:unset_worker_node_personalities:530
2022-12-09 14:58:31 TRACE: RC: 1.
STDOUT: [
Generated: 2022-12-09 14:58:31
]
STDERR: ['UNSET' is not a valid node location
]
LOGGING FROM: yosemite_bundle_upgrade.py:unset_worker_node_personalities:530
2022-12-09 14:58:31 ERROR: Error running command [ap node set_personality UNSET --magneto_only -f].
You
must apply the following workaround.- Set the node personalities after 1.0.8.2 upgrade is complete. Depending on the existing node
personality, run:
- For
CONTROL,WORKER
run:ap node set_personality <node> CONTROL,UNSET --magneto_only -f
- For
WORKER,WORKER
run:ap node set_personality <node> UNSET,UNSET --magneto_only -f
- For
WORKER,UNSET
run:
For example:ap node set_personality <node> UNSET,UNSET --magneto_only -f
[root@gt25-node1 upgrade]# ap node +------------------+---------+----------------+-----------+-----------+ | Node | State | Personality | Monitored | Is Master | +------------------+---------+----------------+-----------+-----------+ | enclosure1.node1 | ENABLED | CONTROL,WORKER | YES | YES | | enclosure2.node1 | ENABLED | CONTROL,WORKER | YES | NO | | enclosure3.node1 | ENABLED | CONTROL,WORKER | YES | NO | | enclosure4.node1 | ENABLED | UNSET,UNSET | YES | NO | | enclosure5.node1 | ENABLED | UNSET,UNSET | YES | NO | | enclosure6.node1 | ENABLED | UNSET,UNSET | YES | NO | +------------------+---------+----------------+-----------+-----------+ [root@gt25-node1 upgrade]# ap node set_personality enclosure1.node1 CONTROL,UNSET --magneto_only -f Node role change request sent successfully Generated: 2022-12-13 11:08:07 [root@gt25-node1 upgrade]# ap node +------------------+---------+----------------+-----------+-----------+ | Node | State | Personality | Monitored | Is Master | +------------------+---------+----------------+-----------+-----------+ | enclosure1.node1 | ENABLED | CONTROL,UNSET | YES | YES | | enclosure2.node1 | ENABLED | CONTROL,WORKER | YES | NO | | enclosure3.node1 | ENABLED | CONTROL,WORKER | YES | NO | | enclosure4.node1 | ENABLED | UNSET,UNSET | YES | NO | | enclosure5.node1 | ENABLED | UNSET,UNSET | YES | NO | | enclosure6.node1 | ENABLED | UNSET,UNSET | YES | NO | +------------------+---------+----------------+-----------+-----------+
- For
Confirm the above results