Upgrading to version 1.0.8
Upgrade to version 1.0.8 is performed by IBM Support.
The system upgrade timings are limited to three levels:
- Base
- Base + 2
- Base + 8
System type | Dell/Lenovo | Approximate upgrade time |
---|---|---|
Base system | Lenovo | 4 hours |
Base + 2 system | Dell | 3 hours |
Base + 2 system | Lenovo | 4 hours 30 minutes |
Base + 8 system | Lenovo | 4 hours 30 minutes |
The average upgrade time is approximately 4 hours 30 minutes.
Note: The upgrade might take more
time in case of a larger deployment.
Before you begin
Upgrade prerequisites:
- If you are running Cloud Pak for Data System 1.0.7.8 version, before you start upgrading to 1.0.8, you must apply 1.0.7.8 Interim Fix 2. For more information, see 1.0.7.8 Interim Fix 2 release notes.
Draft comment: natalia.szczepanska@ibm.com
Antyhing to be added?
Antyhing to be added?
Network setup prerequisites:
- Before you start the upgrade, from
/opt/ibm/appliance/platform/apos-comms/customer_network_config/ansible
directory, you must
run:
If any changes are listed inANSIBLE_HASH_BEHAVIOUR=merge ansible-playbook -i ./System_Name.yml playbooks/house_config.yml --check -v
--check --v
, ensure that they are expected. If they are unexpected, you must edit the YAML file so that it contains only the expected changes. You might rerun this command as necessary until you see no errors.
Netezza prerequisites:
- Support for Netezza Performance Server 11.2.1.5 at minimum.
About this task
Draft comment: natalia.szczepanska@ibm.com
is that a true statement?
Only the system bundle
is upgraded in 1.0.8. There is no need to download the following
packages:is that a true statement?
icpds_vm
icpds_rhos_repo
icpds_services
icpds_services_addon_cyclops
Procedure
Results
| 439 | SW_NEEDS_ATTENTION | SW | Openshift node is not ready | YES |
| 440 | SW_NEEDS_ATTENTION | SW | Openshift service is not ready | YES |
| 446 | SW_NEEDS_ATTENTION | SW | ICP4D service is not ready | YES |
| 451 | SW_NEEDS_ATTENTION | SW | Webconsole service is not ready | YES |
| 460 | SW_NEEDS_ATTENTION | SW | Portworx component is not healthy
Close
them manually with the following command:ap issues --close <alert_id>
As
part of the upgrade process, VMs are disabled on all nodes and they are shut down. They are expected
to stay in the shut off
state in
1.0.8.[root@gt01-node1 ~]# for node in `/opt/ibm/appliance/platform/xcat/scripts/xcat/display_nodes.py`; do echo ${node}; ssh $node virsh list --all; done
e1n1
Id Name State
----------------------------------------------------
- e1n1-1-control shut off
e1n2
Id Name State
----------------------------------------------------
- e1n2-1-control shut off
e1n3
Id Name State
----------------------------------------------------
- e1n3-1-control shut off
e1n4
Id Name State
----------------------------------------------------
- e1n4-1-worker shut off
e2n1
e2n2
e2n3
e2n4
Draft comment: natalia.szczepanska@ibm.com
I don't think this is the case in 108. TBC.
In 1.0.8, the Netezza web console runs on one of the three control nodes
(or on a connector node if installed). There are two docker containers required for operation of the
Netezza console: I don't think this is the case in 108. TBC.
cyclops
and the associated influxdb
container.
Container images are installed on all control nodes for high availability. When a control node goes
out of service, Platform Manager starts the cyclops
and influxdb
containers on another control node (or a connector
node).[root@gt18-node1 ~]# for node in `/opt/ibm/appliance/platform/xcat/scripts/xcat/display_nodes.py --control`; do echo ${node}; ssh $node docker ps -a | grep -E 'cyclops|influxdb'; done
e1n1
c7d402b47de8 cyclops:4.0.2-20221114b30631-x86_64 "/scripts/start.sh" 4 days ago Exited (255) 30 hours ago cyclops
9f960b843510 influxdb:latest "/entrypoint.sh in..." 4 days ago Exited (255) 30 hours ago 0.0.0.0:8086->8086/tcp influxdb
e1n2
642f4b0b5087 cyclops:4.0.2-20221114b30631-x86_64 "/scripts/start.sh" 4 days ago Up 30 hours 80/tcp, 3000/tcp, 5480/tcp, 0.0.0.0:3333->3333/tcp, 0.0.0.0:8843->8443/tcp cyclops
177a97aaa701 influxdb:latest "/entrypoint.sh in..." 4 days ago Up 30 hours 0.0.0.0:8086->8086/tcp influxdb
e1n3
d590a49d369f cyclops:4.0.2-20221114b30631-x86_64 "/scripts/start.sh" 4 days ago Exited (137) 4 days ago cyclops
19e5f305548e influxdb:latest "/entrypoint.sh in..." 4 days ago Exited (0) 4 days ago influxdb
[root@gt18-node1 ~]#
Draft comment: natalia.szczepanska@ibm.com
updated with 108 input from Jim Geneva.
The updated with 108 input from Jim Geneva.
ap version -s
container reflects the web
console version:[root@gt18-node1 ~]# ap version -s
Appliance software version is 1.0.8.0
All component versions are synchronized.
+-----------------------------+-----------------------------------------------------------------+
| Component Name | Version |
+-----------------------------+-----------------------------------------------------------------+
| Appliance platform software | 1.0.8.0-20221130100627b31176 |
| aposcomms | ibm-apos-named-config : 1.0.5.1-1 |
| | ibm-apos-network-tools : 2.0.4.0-1 |
| | ibm-apos-common : 1.0.8.2-1 |
| | ibm-apos-udev-rules-config : 1.0.4.0-1 |
| | ibm-apos-chrony-config : 1.0.1.0-3 |
| | ibm-apos-dhcpd-config : 1.0.4.1-1 |
| | ibm-apos-network-config : 1.1.9.0-1 |
| apupgrade | 1.0.8.0-20221128071127b31129 |
| callhome | 0.1.0.0 |
| containerapi | 1.0.23.0-20221103134948b30181 |
| cyclops | 4.0.2-20221114b30631 |
| docker-upgrade | oci-systemd-hook : 0.2.0-1 |
| | oci-umount : 2.5-3 |
| | oci-register-machine : 0-6 |
| | atomic-registries : 1.22.1-29 |
| | docker-common : 1.13.1-161 |
| | docker-client : 1.13.1-161 |
| | docker-debuginfo : 1.13.1-161 |
| | docker : 1.13.1-161 |
| | docker-rhel-push-plugin : 1.13.1-161 |
| | container-storage-setup : 0.11.0-2 |
| | container-selinux : 2.119.2-1.911c772 |
| | containers-common : 0.1.40-11 |
| | python-pytoml : 0.1.14-1 |
| gpfs | 5.1.2-1 |
| gpfsconfig | 1.0.8.0-20221130082401b31174 |
| hpi | hpi-cumulus-mgtsw-firmware : 2.0.0.0-20221103173754 |
| | hpicfg : 2.0.4.4-20221028182852b1 |
| | hpi-cumulus-fabspine-firmware : 2.0.0.0-20221103173754 |
| | hpi-software : 1.0.8.0-20221123203813b13 |
| | hpi-lenovo-node-firmware : 1.8.0.0-20221103173754 |
| | hpiutils : 2.0.4.4-20221028182843b1 |
| | hpi-x86_64-image : 2.0.4.5-20221102223218b7 |
| | hpi-cumulus-switch-firmware : 2.0.0.0-20221103173754 |
| | hpi-dell-node-firmware : 1.8.0.0-20221103173754 |
| | hpi-cumulus-fabsw-firmware : 2.0.0.0-20221103173754 |
| | dct : 1.0.7.8-20221103022304b7 |
| magneto | 1.0.28.1-20221121132020b30919 |
| mellanox | 1.0.8.0 |
| mvcli | 2.3.10.1095 |
| nodeos | 1.0.8.0-20221130073105b31174 |
| platformbackups | 1.0.20.0-20221028152919b29972 |
| psklm | 1.0.21.0-20221122015956b5 |
| solarflare | 4.15.10.1002 |
| supporttools | 1.0.23.10-20221115174015b30691 |
+-----------------------------+-----------------------------------------------------------------+
Draft comment: natalia.szczepanska@ibm.com
updated with 108 input from Jim Geneva.
updated with 108 input from Jim Geneva.
Firmware post-upgrade steps
After you finish your 1.0.8 upgrade process, you must ensure that your firmware is also upgraded.
Draft comment: natalia.szczepanska@ibm.com
https://github.ibm.com/privatecloud-ap/cpds-issues/issues/3840
https://github.ibm.com/privatecloud-ap/cpds-issues/issues/3840
Procedure
Nodes personalities check post-upgrade steps
1.0.8 upgrade on Dell systems does not remove WORKER
personality. You
must check the nodes to ensure that there are no WORKER
personalities after the
upgrade completes. This issue happens when the worker vm
is gone from
e4n1
, but the node still has the WORKER
personality
set.
About this task
If you run virsh list --all and see the following output:
[root@e4n1 ~]# virsh list --all
Id Name State
----------------------------------------------------
[root@nz5-node1 ~]# ap node -d
+------------------+---------+---------------+-----------+-----------+--------+---------------+---------------+
| Node | State | Personality | Monitored | Is Master | Is HUB | Is VDB Master | Is NRS Master |
+------------------+---------+---------------+-----------+-----------+--------+---------------+---------------+
| enclosure1.node1 | ENABLED | CONTROL,UNSET | YES | YES | YES | NO | NO |
| enclosure2.node1 | ENABLED | CONTROL,UNSET | YES | NO | NO | NO | NO |
| enclosure3.node1 | ENABLED | CONTROL,UNSET | YES | NO | NO | NO | NO |
| enclosure4.node1 | ENABLED | WORKER,UNSET | YES | NO | NO | NO | NO |
+------------------+---------+---------------+-----------+-----------+--------+---------------+---------------+
Or
the following error in the tracelog:2022-12-09 14:58:30 INFO: Checking for UNSET node(s)
LOGGING FROM: yosemite_bundle_upgrade.py:get_unset_worker_node_names:544
2022-12-09 14:58:30 TRACE: Running command [ap node | grep UNSET | cut -f2 -d '|'].
LOGGING FROM: yosemite_bundle_upgrade.py:get_unset_worker_node_names:544
2022-12-09 14:58:30 TRACE: RC: 0.
STDOUT: [ enclosure1.node1
enclosure2.node1
enclosure3.node1
enclosure4.node1
]
STDERR: []
LOGGING FROM: yosemite_bundle_upgrade.py:get_unset_worker_node_names:544
2022-12-09 14:58:30 TRACE: Running command [ap node set_personality UNSET --magneto_only -f].
LOGGING FROM: yosemite_bundle_upgrade.py:unset_worker_node_personalities:530
2022-12-09 14:58:31 TRACE: RC: 1.
STDOUT: [
Generated: 2022-12-09 14:58:31
]
STDERR: ['UNSET' is not a valid node location
]
LOGGING FROM: yosemite_bundle_upgrade.py:unset_worker_node_personalities:530
2022-12-09 14:58:31 ERROR: Error running command [ap node set_personality UNSET --magneto_only -f].
You
must apply the following workaround.- Set the node personalities after 1.0.8 upgrade is complete. Depending on the existing node
personality, run:
- For
CONTROL,WORKER
run:ap node set_personality <node> CONTROL,UNSET --magneto_only -f
- For
WORKER,WORKER
run:ap node set_personality <node> UNSET,UNSET --magneto_only -f
- For
WORKER,UNSET
run:
For example:ap node set_personality <node> UNSET,UNSET --magneto_only -f
[root@gt25-node1 upgrade]# ap node +------------------+---------+----------------+-----------+-----------+ | Node | State | Personality | Monitored | Is Master | +------------------+---------+----------------+-----------+-----------+ | enclosure1.node1 | ENABLED | CONTROL,WORKER | YES | YES | | enclosure2.node1 | ENABLED | CONTROL,WORKER | YES | NO | | enclosure3.node1 | ENABLED | CONTROL,WORKER | YES | NO | | enclosure4.node1 | ENABLED | UNSET,UNSET | YES | NO | | enclosure5.node1 | ENABLED | UNSET,UNSET | YES | NO | | enclosure6.node1 | ENABLED | UNSET,UNSET | YES | NO | +------------------+---------+----------------+-----------+-----------+ [root@gt25-node1 upgrade]# ap node set_personality enclosure1.node1 CONTROL,UNSET --magneto_only -f Node role change request sent successfully Generated: 2022-12-13 11:08:07 [root@gt25-node1 upgrade]# ap node +------------------+---------+----------------+-----------+-----------+ | Node | State | Personality | Monitored | Is Master | +------------------+---------+----------------+-----------+-----------+ | enclosure1.node1 | ENABLED | CONTROL,UNSET | YES | YES | | enclosure2.node1 | ENABLED | CONTROL,WORKER | YES | NO | | enclosure3.node1 | ENABLED | CONTROL,WORKER | YES | NO | | enclosure4.node1 | ENABLED | UNSET,UNSET | YES | NO | | enclosure5.node1 | ENABLED | UNSET,UNSET | YES | NO | | enclosure6.node1 | ENABLED | UNSET,UNSET | YES | NO | +------------------+---------+----------------+-----------+-----------+
- For
Is that support or the clients?