Prerequisites and prechecks
Planning, prerequisites, and prechecks you must go through before upgrade.
Prerequisites
- Ensure that you are on IBM Storage Fusion HCI System version 2.4.
- If you installed IBM Storage Fusion HCI System version 2.4 by using offline or online installation mode, then ensure that you do not change the mode during the upgrade to 2.5.2. To change the installation mode, reinstall IBM Storage Fusion HCI System 2.5.2.
- Whenever the "IBM Spectrum Protect Plus
license expired" error occurs, do the following steps to fix the license issue:
- Log in to IBM Spectrum Protect Plus by using your
spp-connection
secret values. For the procedure to login, see Logging into IBM Spectrum Protect Plus.Note: The default credentials are admin/password. - If you get a license expired error, then retrieve the license file
/spp/server/SPP.lic from
isf_bkprstr
operator pod usingoc
command.See the following sampleoc
command:
Replace <podname> with your available podname. For example:oc cp isf-bkprstr-operator-controller-manager-<podname>:/spp/server/SPP.lic SPP.lic
<Podname>:isf-bkprstr-operator-controller-manager-599dc5b756-vcjd6
Note: You must have aspp-connection
secret after your first time login to IBM Spectrum Protect Plus by using the default set of credentials. For more information about thespp-connection
secret creation, see What to do next section of Backup & Restore (Legacy). - Copy the license and upload it from the user interface. For more details, see Uploading the product key.
- Log in to IBM Spectrum Protect Plus by using your
- Follow the prerequisites when you upgrade the IBM Spectrum Scale.
- Ensure that all the core pods need to be in running status.
- Run the following command to check the status of the core
pods.
oc get daemons ibm-spectrum-scale -n ibm-spectrum-scale -ojson | jq -r '.status.podsStatus'
- Ensure that there are no pods in any of the following states:
- starting
- terminating
- unknown
- waitingForDeleteIn the following example, the output shows 1 pod in
waitingForDelete
, so the upgrade should not be done at this time.$ oc get daemons ibm-spectrum-scale -n ibm-spectrum-scale -ojson | jq -r '.status.podsStatus' { "running": "4", "starting": "0", "terminating": "0", "unknown": "0", "waitingForDelete": "1" }
- Ensure that none of component should be failed or degraded
state.
[root@tucmgen2 home]# oc rsh compute-1-ru6 mmhealth cluster show Defaulted container "gpfs" out of: gpfs, logs, mmbuildgpl (init), config (init) Component Total Failed Degraded Healthy Other -------------------------------------------------------------------------------------- NODE 8 0 0 2 6 GPFS 8 0 0 2 6 NETWORK 8 0 0 8 0 FILESYSTEM 1 0 0 1 0 DISK 36 0 0 36 0 AFM 0 0 0 0 0 FILESYSMGR 1 0 0 1 0 GUI 2 0 0 2 0 NATIVE_RAID 6 0 0 6 0 PERFMON 8 0 0 8 0 THRESHOLD 8 0 0 8 0 [root@tucmgen2 home]#
oc rsh compute-1-ru6 mmhealth node show -N all Defaulted container "gpfs" out of: gpfs, logs, mmbuildgpl (init), config (init) Node name: compute-1-ru23.daemon.ibm-spectrum-scale.stg.rackc-fusion.tuc.stglabs.ibm.com. Node status: HEALTHY Status Change: 1 day ago Component Status Status Change Reasons & Notices ---------------------------------------------------------------------------------------------------- GPFS HEALTHY 1 day ago - NETWORK HEALTHY 1 day ago - FILESYSTEM HEALTHY 1 day ago - AFM TIPS 1 day ago afm_sensors_inactive(GPFSAFM, GPFSAFMFS, GPFSAFMFSET) PERFMON HEALTHY 1 day ago - THRESHOLD HEALTHY 1 day ago - Node name: compute-1-ru24.daemon.ibm-spectrum-scale.stg.rackc-fusion.tuc.stglabs.ibm.com. Node status: HEALTHY Status Change: 1 day ago Component Status Status Change Reasons & Notices ---------------------------------------------------------------------------------------------------- GPFS HEALTHY 1 day ago - NETWORK HEALTHY 1 day ago - FILESYSTEM HEALTHY 1 day ago - AFM TIPS 1 day ago afm_sensors_inactive(GPFSAFM, GPFSAFMFS, GPFSAFMFSET) PERFMON HEALTHY 1 day ago - THRESHOLD HEALTHY 1 day ago - Node name: compute-1-ru5.daemon.ibm-spectrum-scale.stg.rackc-fusion.tuc.stglabs.ibm.com. Node status: TIPS Status Change: 1 day ago Component Status Status Change Reasons & Notices ------------------------------------------------------------------------------- GPFS TIPS 1 day ago numactl_not_installed NETWORK HEALTHY 1 day ago - FILESYSTEM HEALTHY 1 day ago - DISK HEALTHY 1 day ago - NATIVE_RAID HEALTHY 1 day ago - PERFMON HEALTHY 1 day ago - THRESHOLD HEALTHY 1 day ago - Node name: compute-1-ru6.daemon.ibm-spectrum-scale.stg.rackc-fusion.tuc.stglabs.ibm.com. Node status: TIPS Status Change: 1 day ago Component Status Status Change Reasons & Notices ------------------------------------------------------------------------------- GPFS TIPS 1 day ago numactl_not_installed NETWORK HEALTHY 1 day ago - FILESYSTEM HEALTHY 1 day ago - DISK HEALTHY 1 day ago - NATIVE_RAID HEALTHY 1 day ago - PERFMON HEALTHY 1 day ago - THRESHOLD HEALTHY 1 day ago - Node name: compute-1-ru7.daemon.ibm-spectrum-scale.stg.rackc-fusion.tuc.stglabs.ibm.com. Node status: TIPS Status Change: 1 day ago Component Status Status Change Reasons & Notices ------------------------------------------------------------------------------- GPFS TIPS 1 day ago numactl_not_installed NETWORK HEALTHY 1 day ago - FILESYSTEM HEALTHY 1 day ago - DISK HEALTHY 1 day ago - NATIVE_RAID HEALTHY 1 day ago - PERFMON HEALTHY 1 day ago - THRESHOLD HEALTHY 1 day ago - Node name: control-1-ru2.daemon.ibm-spectrum-scale.stg.rackc-fusion.tuc.stglabs.ibm.com. Node status: TIPS Status Change: 1 day ago Component Status Status Change Reasons & Notices ------------------------------------------------------------------------------- GPFS TIPS 1 day ago numactl_not_installed NETWORK HEALTHY 1 day ago - FILESYSTEM HEALTHY 1 day ago - DISK HEALTHY 1 day ago - GUI HEALTHY 1 day ago - NATIVE_RAID HEALTHY 1 day ago - PERFMON HEALTHY 1 day ago - THRESHOLD HEALTHY 1 day ago - Node name: control-1-ru3.daemon.ibm-spectrum-scale.stg.rackc-fusion.tuc.stglabs.ibm.com. Node status: TIPS Status Change: 1 day ago Component Status Status Change Reasons & Notices ------------------------------------------------------------------------------------------- GPFS TIPS 1 day ago callhome_not_enabled, numactl_not_installed NETWORK HEALTHY 1 day ago - FILESYSTEM HEALTHY 1 day ago - DISK HEALTHY 1 day ago - GUI HEALTHY 1 day ago - NATIVE_RAID HEALTHY 1 day ago - PERFMON HEALTHY 1 day ago - THRESHOLD HEALTHY 1 day ago - Node name: control-1-ru4.daemon.ibm-spectrum-scale.stg.rackc-fusion.tuc.stglabs.ibm.com. Node status: TIPS Status Change: 1 day ago Component Status Status Change Reasons & Notices ------------------------------------------------------------------------------- GPFS TIPS 1 day ago numactl_not_installed NETWORK HEALTHY 1 day ago - FILESYSTEM HEALTHY 1 day ago - DISK HEALTHY 1 day ago - FILESYSMGR HEALTHY 1 day ago - NATIVE_RAID HEALTHY 1 day ago - PERFMON HEALTHY 1 day ago - THRESHOLD HEALTHY 1 day ago -
- Run the following command to check the scale pods.
oc describe daemon
- Run the following command to check the status of the storage scale
cluster.
mmhealth
- Setup enterprise registryIf you installed the earlier version of IBM Storage Fusion HCI System by using your enterprise registry, then follow the steps to mirror images in your enterprise registry.
- Mirror IBM Storage Fusion HCI System 2.5.2 images, IBM Spectrum Scale images, and IBM Spectrum Protect Plus images. For steps to mirror, see Mirroring your images to the enterprise registry.
- Update the global pull secret with the mirror registry credentials to which the current version images are mirrored. If you are mirroring to the same enterprise registry that you used for the previous version, then ignore this step.
- Modify the image content source policy
isf-operator-index
to add the new mirror that points to the new registry for each source defined in the image content source policy. If you are mirroring to the same enterprise registry that you used for the previous version, then ignore this step.See the sample for image content source policy:Note: After the IBM Storage Fusion HCI System is upgraded, you can see all the new IBM Storage Fusion services introduced in 2.5.2. If you want to install the new services, add image content source policy and the related image. For more information, see Installing IBM Storage Fusion On-premises.apiVersion: operator.openshift.io/v1alpha1 kind: ImageContentSourcePolicy metadata: name: isf-operator-index spec: repositoryDigestMirrors: # for scale - mirrors: - <ISF 2.4.0 enterprise registry>/<ISF 2.4.0 target-path> - <ISF 2.5.2 enterprise registry>/<ISF 2.5.2 target-path> source: cp.icr.io/cp/spectrum/scale - mirrors: - <ISF 2.4.0 enterprise registry>/<ISF 2.4.0 target-path> - <ISF 2.5.2 enterprise registry>/<ISF 2.5.2 target-path> source: icr.io/cpopen #for IBM Spectrum Fusion operator - mirrors: - <ISF 2.4.0 enterprise registry>/<ISF 2.4.0 target-path> - <ISF 2.5.2 enterprise registry>/<ISF 2.5.2 target-path> source: cp.icr.io/cp/isf # for spp agent - mirrors: - <ISF 2.4.0 enterprise registry>/<ISF 2.4.0 target-path>/sppc - <ISF 2.5.2 enterprise registry>/<ISF 2.5.2 target-path>/sppc source: cp.icr.io/cp/sppc - mirrors: - <ISF 2.4.0 enterprise registry>/<ISF 2.4.0 target-path>/sppc - <ISF 2.5.2 enterprise registry>/<ISF 2.5.2 target-path>/sppc source: registry.redhat.io/amq7 - mirrors: - <ISF 2.4.0 enterprise registry>/<ISF 2.4.0 target-path>/sppc - <ISF 2.5.2 enterprise registry>/<ISF 2.5.2 target-path>/sppc source: registry.redhat.io/oadp # for ose-kube-rbac-proxy - mirrors: - <ISF 2.4.0 enterprise registry>/<ISF 2.4.0 target-path>/openshift4/ose-kube-rbac-proxy - <ISF 2.5.2 enterprise registry>/<ISF 2.5.2 target-path>/openshift4/ose-kube-rbac-proxy source: registry.redhat.io/openshift4/ose-kube-rbac-proxy - mirrors: - <ISF 2.5.2 enterprise registry>/<ISF 2.5.2 target-path>/sppc/amq-streams-operator-bundle source: registry.redhat.io/amq7/amq-streams-operator-bundle - mirrors: - <ISF 2.5.2 enterprise registry>/<ISF 2.5.2 target-path>/sppc/oadp-operator-bundle source: registry.redhat.io/oadp/oadp-operator-bundle
Prechecks
- User interface checks:
- Ensure that all compute nodes are in ready state on the OpenShift user interface as well as on the Nodes page of IBM Storage Fusion HCI System user interface.
- In the IBM Storage Fusion HCI System user interface, check whether no nodes, disks, or switches are in critical state. In the page, go to Switches, VLAN, and Links tabs to check their statuses. page of the
- Go to Events page in the IBM Storage Fusion HCI System user interface and check whether there are any critical events.
- Ensure that you collect the logs before you upgrade. For more information, see Collecting log packages for IBM Storage Fusion HCI System.
- Collect the system health check logs before you upgrade the IBM Storage Fusion HCI System.
- Collect the Backup & Restore (Legacy) logs before you upgrade the Backup & Restore (Legacy).
- Collect the storage logs before you upgrade the Global Data Platform.
- Run the following command to check whether all nodes are in Ready state with no unscheduable
taint:
Sample output:oc get nodes
NAME STATUS ROLES AGE VERSION compute-1-ru23.example.domain.com Ready worker 34d v1.23.17+16bcd69 compute-1-ru24.example.domain.com Ready worker 34d v1.23.17+16bcd69 compute-1-ru5.example.domain.com Ready worker 34d v1.23.17+16bcd69 compute-1-ru6.example.domain.com Ready worker 34d v1.23.17+16bcd69 compute-1-ru7.example.domain.com Ready worker 34d v1.23.17+16bcd69 control-1-ru2.example.domain.com Ready master 34d v1.23.17+16bcd69 control-1-ru3.example.domain.com Ready master 34d v1.23.17+16bcd69 control-1-ru4.example.domain.com Ready master 34d v1.23.17+16bcd69
- Run the following command to confirm whether the nodes in the machine config pool are not in
degraded state:
oc get mcp
Example output:
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-9bfe23f117352384b87e460ac8371323 True False False 3 3 3 0 6d6h worker rendered-worker-d0fdbacd90381d7ebc7d44adaf7c8907 True False False 3 3 3 0 6d6h [root@roadierackd ~]#
In the output, check for the following values:- The values of READYMACHINECOUNT and UPDATEDMACHINECOUNT must be same as MACHINECOUNT.
- The value of DEGRADED is False.
- The value of Updated is True and Updating is False.
- Verify whether the pods in the Fusion namespace and the namespaces of its services are healthy
and in a running state. To verify, run the following command for each namespace:
oc get po -n <name of the namespace>
List of namespaces:ibm-spectrum-fusion-ns
ibm-spectrum-scale
ibm-spectrum-scale-operator
ibm-spectrum-scale-dns
ibm-spectrum-scale-csi
ibm-spectrum-protect-plus-ns
baas
ibm-data-cataloging
ibm-backup-restore
Note: The namespaces list vary based on the services you have installed. - Check for pods in each namespace using the following commands:
oc -n ibm-spectrum-fusion-ns get po
oc -n ibm-spectrum-scale get po
oc -n ibm-spectrum-scale-operator get po
oc -n ibm-spectrum-scale-dns get po
oc -n ibm-spectrum-scale-csi get po
oc -n ibm-spectrum-protect-plus-ns get po
oc -n baas get po
Sample output:NAME READY STATUS RESTARTS AGE callhomeclient-68887645b8-78xc4 1/1 Running 0 10h callhomeclient-68887645b8-gzhfr 1/1 Running 0 6d3h eventmanager-5f6d458cf9-hsp7r 1/1 Running 0 10h eventmanager-5f6d458cf9-z6g8r 1/1 Running 0 6d3h grafana-deployment-6dcff5fd67-7dj42 1/1 Running 0 6d3h grafana-operator-controller-manager-649f7bbcbc-s699p 2/2 Running 0 6d3h isf-application-operator-controller-manager-69589f8f8c-8fvrw 2/2 Running 0 6d3h isf-bkprstr-operator-controller-manager-74c7757bf6-nbl5p 2/2 Running 3 (10h ago) 6d3h isf-compute-operator-controller-manager-68cd6f658b-7kdg2 2/2 Running 2 (10h ago) 6d3h isf-data-protection-operator-controller-manager-6b8dddf66-7g68g 2/2 Running 0 6d2h isf-ics-operator-controller-manager-7cb7d6dc74-w5rhz 2/2 Running 0 10h isf-metrodr-operator-controller-manager-78bd5dbf57-n72rl 2/2 Running 0 10h isf-network-operator-controller-manager-947bc9f5c-f2zqk 2/2 Running 0 10h isf-prereq-operator-controller-manager-7c4ff6c86-vdbkl 2/2 Running 2 (10h ago) 6d3h isf-proxy-7b6dc8bf98-2xpk5 1/1 Running 0 6d3h isf-proxy-7b6dc8bf98-4kkcc 1/1 Running 0 10h isf-serviceability-operator-controller-manager-5578dc7d84-g2knl 2/2 Running 1 (11h ago) 6d3h isf-storage-operator-controller-manager-777d465fbb-m4wvs 2/2 Running 4 (10h ago) 6d3h isf-storage-service-dep-85c68d7b54-jppz9 1/1 Running 0 6d3h isf-ui-dep-845dd8554-bxshd 1/1 Running 0 6d3h isf-ui-dep-845dd8554-st6s5 1/1 Running 0 10h isf-ui-operator-controller-manager-77fb8fc9c4-w66qx 2/2 Running 2 (11h ago) 6d3h isf-update-operator-controller-manager-fb8c656c9-qv5g4 2/2 Running 0 10h logcollector-5b8659b8c7-6psvt 1/1 Running 0 6d3h logcollector-5b8659b8c7-9rjqq 1/1 Running 0 10h spp-dp-controller-manager-59f6dcdbdc-tbtmf 2/2 Running 0 6d2h trapserver-0 1/1 Running 0 6d3h
- Ensure that there are no
catalogsource
pod errors inopenshift-markerplace
. If errors are found, then fix them before you start the upgrade process. For more information, see Troubleshooting issues in IBM Storage Fusion HCI System. - Ensure that all persistent volumes and persistent volume claims are in a
Bound
state. Run the following command for Persistent Volumes:
Run the following command for Persistent Volumes Claims:oc get pv -n <namespace>
For the namespaces, see namespace list.oc get pvc -n <namespace>
Sample command and output:oc get pvc -n ibm-spectrum-fusion-ns
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ibm-spectrum-fusion-ns isf-bkprstr-claim Bound pvc-9c25f079-2a5a-4fe1-b1fe-782a180e8a71 5Gi RWX ibm-spectrum-fusion-mgmt-sc 31d ibm-spectrum-fusion-ns logcollector Bound pvc-527a53f3-c5a9-4ab3-936c-196f34e94724 25Gi RWX ibm-spectrum-fusion-mgmt-sc 34d
- Run the following command to check whether all cluster operators are available and their
DEGRADED state is False, Available is True, and
Progressing is False:
oc get co
- Check whether all operators are in Succeeded state:
oc get csv -A|grep -v elastic
- Ensure that the catalog sources are in Ready state.
Check the health of the catalog sources in OpenShift cluster. Run the following command to check all the catalog sources together:
oc get catsrc -A -o yaml |grep lastObservedState
f:lastObservedState: {} lastObservedState: READY f:lastObservedState: {} lastObservedState: READY f:lastObservedState: {} lastObservedState: READY f:lastObservedState: {} lastObservedState: READY f:lastObservedState: {} lastObservedState: READY f:lastObservedState: {} lastObservedState: READY f:lastObservedState: {} lastObservedState: READY
-
For offline, check whether the
ibm-opencloud
andibm-operator-catalog
catalog source are in error state. If they are in an error state and not in use, then delete them.- To validate if
catalogsource
are in use or not, run the following command to list all the subscriptions in IBM Storage Fusion HCI System cluster.
If the catalog source name under source fields does not showoc get sub -A
ibm-opencloud
andibm-operator-catalog
in the output, then it is not in use. - Run
oc get catsrc -A
command to ensure the following catalog sources are present and in ready state:community-operators
isf-catalog
redhat-operators
- Run the following command to check whether
olm pod catalog-operator
is running inopenshift-operator-lifecycle-management
project:oc -n openshift-lifecycle-management get po
- To validate if
- Ensure that IBM Spectrum Scale is healthy: Run the following command to check whether the
Scale
CR (storagemanager) reports the cluster as healthy:oc -n ibm-spectrum-fusion-ns get scales storagemanager -oyaml | grep storageClusterStatus
The output of the command must behealthy
. If it shows as DEGRADED, then resolve the issue and then proceed with the upgrade.Note: If the IBM Spectrum Scale is not healthy, do not initiate the upgrade. Check events to get further details about the problem. Contact IBM support in case the issue cannot be resolved.Run the following commands to check the pod:oc project ibm-spectrum-scale
From the resultant pod, run the following commands:oc rsh control-0
mmlsmount all
mmlscluster
In this example output, the file system
ibmspectrum-fs
is mounted on 6 nodes. The number depends on the nodes in your cluster.GPFS cluster information ======================== GPFS cluster name: ibm-spectrum-scale.example.domain.com GPFS cluster id: 6734170828145876673 GPFS UID domain: ibm-spectrum-scale.example.domain.com Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 1 control-1-ru4.daemon.ibm-spectrum-scale.example.domain.com. 192.1.1.1 control-1-ru4.admin.ibm-spectrum-scale.example.domain.com. quorum-manager-perfmon 2 compute-1-ru5.daemon.ibm-spectrum-scale.example.domain.com. 192.1.1.2 compute-1-ru5.admin.ibm-spectrum-scale.example.domain.com. quorum-manager-perfmon 3 compute-1-ru6.daemon.ibm-spectrum-scale.example.domain.com. 192.1.1.3 compute-1-ru6.admin.ibm-spectrum-scale.example.domain.com. quorum-manager-perfmon 4 control-1-ru3.daemon.ibm-spectrum-scale.example.domain.com. 192.1.1.4 control-1-ru3.admin.ibm-spectrum-scale.example.domain.com. quorum-manager-perfmon 5 control-1-ru2.daemon.ibm-spectrum-scale.example.domain.com. 192.1.1.5 control-1-ru2.admin.ibm-spectrum-scale.example.domain.com. quorum-manager-perfmon 6 compute-1-ru7.daemon.ibm-spectrum-scale.example.domain.com. 192.1.1.6 compute-1-ru7.admin.ibm-spectrum-scale.example.domain.com. quorum-manager-perfmon
Run the following command to ensure that IBM Spectrum Scale is healthy. Ensure that all node states are active. If any node state is down, then you need to bring it to an active state.mmgetstate -a
- Ensure node upsize and disk scale out are not initiated until upgrade is complete.
- Make sure no operation is done on OpenShift Container Platform
cluster that causes machine configuration rollout, for example:
- Node maintenance
- Node reboot
- Old firmware upgrade
- Image content source policy update
- Pull secret updates
- Logs that you collected by using the IBM Storage Fusion Collect logs user interface page gets deleted after the upgrade process completes. Download the needed logs before you begin the upgrade. In addition, check system health from the log collections.