Prerequisites and prechecks

Planning, prerequisites, and prechecks you must go through before upgrade.

Prerequisites
Prechecks

Prerequisites

Ensure that you are on IBM Storage Fusion HCI System version 2.4.
If you installed IBM Storage Fusion HCI System version 2.4 by using offline or online installation mode, then ensure that you do not change the mode during the upgrade to 2.5.2. To change the installation mode, reinstall IBM Storage Fusion HCI System 2.5.2.
Whenever the "IBM Spectrum Protect Plus license expired" error occurs, do the following steps to fix the license issue:
1. Log in to IBM Spectrum Protect Plus by using your spp-connection secret values. For the procedure to login, see Logging into IBM Spectrum Protect Plus.
  Note: The default credentials are admin/password.
2. If you get a license expired error, then retrieve the license file /spp/server/SPP.lic from isf_bkprstr operator pod using oc command.
  See the following sample oc command:
```
oc cp isf-bkprstr-operator-controller-manager-<podname>:/spp/server/SPP.lic SPP.lic
```
  Replace <podname> with your available podname. For example:
```
<Podname>:isf-bkprstr-operator-controller-manager-599dc5b756-vcjd6
```
  Note: You must have a spp-connection secret after your first time login to IBM Spectrum Protect Plus by using the default set of credentials. For more information about the spp-connection secret creation, see What to do next section of Backup & Restore (Legacy).
3. Copy the license and upload it from the user interface. For more details, see Uploading the product key.

Follow the prerequisites when you upgrade the IBM Spectrum Scale.

Ensure that all the core pods need to be in running status.

Run the following command to check the status of the core pods.

oc get daemons ibm-spectrum-scale -n ibm-spectrum-scale -ojson | jq -r '.status.podsStatus'

Ensure that there are no pods in any of the following states:
- starting
- terminating
- unknown
- waitingForDelete
  In the following example, the output shows 1 pod in waitingForDelete, so the upgrade should not be done at this time.
```
$ oc get daemons ibm-spectrum-scale -n ibm-spectrum-scale -ojson | jq -r '.status.podsStatus'
  {
  "running": "4",
  "starting": "0",
  "terminating": "0",
  "unknown": "0",
  "waitingForDelete": "1"
  }
```

Ensure that none of component should be failed or degraded state.

[root@tucmgen2 home]# oc rsh compute-1-ru6 mmhealth cluster show
Defaulted container "gpfs" out of: gpfs, logs, mmbuildgpl (init), config (init)

Component            Total         Failed       Degraded        Healthy          Other
--------------------------------------------------------------------------------------
NODE                     8              0              0              2              6
GPFS                     8              0              0              2              6
NETWORK                  8              0              0              8              0
FILESYSTEM               1              0              0              1              0
DISK                    36              0              0             36              0
AFM                      0              0              0              0              0
FILESYSMGR               1              0              0              1              0
GUI                      2              0              0              2              0
NATIVE_RAID              6              0              0              6              0
PERFMON                  8              0              0              8              0
THRESHOLD                8              0              0              8              0

[root@tucmgen2 home]#

oc rsh compute-1-ru6 mmhealth node show -N all
Defaulted container "gpfs" out of: gpfs, logs, mmbuildgpl (init), config (init)

Node name:      compute-1-ru23.daemon.ibm-spectrum-scale.stg.rackc-fusion.tuc.stglabs.ibm.com.
Node status:    HEALTHY
Status Change:  1 day ago

Component      Status        Status Change     Reasons & Notices
----------------------------------------------------------------------------------------------------
GPFS           HEALTHY       1 day ago         -
NETWORK        HEALTHY       1 day ago         -
FILESYSTEM     HEALTHY       1 day ago         -
AFM            TIPS          1 day ago         afm_sensors_inactive(GPFSAFM, GPFSAFMFS, GPFSAFMFSET)
PERFMON        HEALTHY       1 day ago         -
THRESHOLD      HEALTHY       1 day ago         -

Node name:      compute-1-ru24.daemon.ibm-spectrum-scale.stg.rackc-fusion.tuc.stglabs.ibm.com.
Node status:    HEALTHY
Status Change:  1 day ago

Component      Status        Status Change     Reasons & Notices
----------------------------------------------------------------------------------------------------
GPFS           HEALTHY       1 day ago         -
NETWORK        HEALTHY       1 day ago         -
FILESYSTEM     HEALTHY       1 day ago         -
AFM            TIPS          1 day ago         afm_sensors_inactive(GPFSAFM, GPFSAFMFS, GPFSAFMFSET)
PERFMON        HEALTHY       1 day ago         -
THRESHOLD      HEALTHY       1 day ago         -

Node name:      compute-1-ru5.daemon.ibm-spectrum-scale.stg.rackc-fusion.tuc.stglabs.ibm.com.
Node status:    TIPS
Status Change:  1 day ago

Component       Status        Status Change     Reasons & Notices
-------------------------------------------------------------------------------
GPFS            TIPS          1 day ago         numactl_not_installed
NETWORK         HEALTHY       1 day ago         -
FILESYSTEM      HEALTHY       1 day ago         -
DISK            HEALTHY       1 day ago         -
NATIVE_RAID     HEALTHY       1 day ago         -
PERFMON         HEALTHY       1 day ago         -
THRESHOLD       HEALTHY       1 day ago         -

Node name:      compute-1-ru6.daemon.ibm-spectrum-scale.stg.rackc-fusion.tuc.stglabs.ibm.com.
Node status:    TIPS
Status Change:  1 day ago

Component       Status        Status Change     Reasons & Notices
-------------------------------------------------------------------------------
GPFS            TIPS          1 day ago         numactl_not_installed
NETWORK         HEALTHY       1 day ago         -
FILESYSTEM      HEALTHY       1 day ago         -
DISK            HEALTHY       1 day ago         -
NATIVE_RAID     HEALTHY       1 day ago         -
PERFMON         HEALTHY       1 day ago         -
THRESHOLD       HEALTHY       1 day ago         -

Node name:      compute-1-ru7.daemon.ibm-spectrum-scale.stg.rackc-fusion.tuc.stglabs.ibm.com.
Node status:    TIPS
Status Change:  1 day ago

Component       Status        Status Change     Reasons & Notices
-------------------------------------------------------------------------------
GPFS            TIPS          1 day ago         numactl_not_installed
NETWORK         HEALTHY       1 day ago         -
FILESYSTEM      HEALTHY       1 day ago         -
DISK            HEALTHY       1 day ago         -
NATIVE_RAID     HEALTHY       1 day ago         -
PERFMON         HEALTHY       1 day ago         -
THRESHOLD       HEALTHY       1 day ago         -

Node name:      control-1-ru2.daemon.ibm-spectrum-scale.stg.rackc-fusion.tuc.stglabs.ibm.com.
Node status:    TIPS
Status Change:  1 day ago

Component       Status        Status Change     Reasons & Notices
-------------------------------------------------------------------------------
GPFS            TIPS          1 day ago         numactl_not_installed
NETWORK         HEALTHY       1 day ago         -
FILESYSTEM      HEALTHY       1 day ago         -
DISK            HEALTHY       1 day ago         -
GUI             HEALTHY       1 day ago         -
NATIVE_RAID     HEALTHY       1 day ago         -
PERFMON         HEALTHY       1 day ago         -
THRESHOLD       HEALTHY       1 day ago         -

Node name:      control-1-ru3.daemon.ibm-spectrum-scale.stg.rackc-fusion.tuc.stglabs.ibm.com.
Node status:    TIPS
Status Change:  1 day ago

Component       Status        Status Change     Reasons & Notices
-------------------------------------------------------------------------------------------
GPFS            TIPS          1 day ago         callhome_not_enabled, numactl_not_installed
NETWORK         HEALTHY       1 day ago         -
FILESYSTEM      HEALTHY       1 day ago         -
DISK            HEALTHY       1 day ago         -
GUI             HEALTHY       1 day ago         -
NATIVE_RAID     HEALTHY       1 day ago         -
PERFMON         HEALTHY       1 day ago         -
THRESHOLD       HEALTHY       1 day ago         -

Node name:      control-1-ru4.daemon.ibm-spectrum-scale.stg.rackc-fusion.tuc.stglabs.ibm.com.
Node status:    TIPS
Status Change:  1 day ago

Component       Status        Status Change     Reasons & Notices
-------------------------------------------------------------------------------
GPFS            TIPS          1 day ago         numactl_not_installed
NETWORK         HEALTHY       1 day ago         -
FILESYSTEM      HEALTHY       1 day ago         -
DISK            HEALTHY       1 day ago         -
FILESYSMGR      HEALTHY       1 day ago         -
NATIVE_RAID     HEALTHY       1 day ago         -
PERFMON         HEALTHY       1 day ago         -
THRESHOLD       HEALTHY       1 day ago         -

Run the following command to check the scale pods.
```
oc describe daemon
```
Run the following command to check the status of the storage scale cluster.
```
mmhealth
```

Setup enterprise registry

If you installed the earlier version of IBM Storage Fusion HCI System by using your enterprise registry, then follow the steps to mirror images in your enterprise registry.

Mirror IBM Storage Fusion HCI System 2.5.2 images, IBM Spectrum Scale images, and IBM Spectrum Protect Plus images. For steps to mirror, see Mirroring your images to the enterprise registry.
Update the global pull secret with the mirror registry credentials to which the current version images are mirrored. If you are mirroring to the same enterprise registry that you used for the previous version, then ignore this step.

Modify the image content source policy isf-operator-index to add the new mirror that points to the new registry for each source defined in the image content source policy. If you are mirroring to the same enterprise registry that you used for the previous version, then ignore this step.

See the sample for image content source policy:

Note: After the IBM Storage Fusion HCI System is upgraded, you can see all the new IBM Storage Fusion services introduced in 2.5.2. If you want to install the new services, add image content source policy and the related image. For more information, see Installing IBM Storage Fusion On-premises.


apiVersion: operator.openshift.io/v1alpha1
kind: ImageContentSourcePolicy
metadata:
  name: isf-operator-index
spec:
  repositoryDigestMirrors:
  # for scale
  - mirrors:
    - <ISF 2.4.0 enterprise registry>/<ISF 2.4.0 target-path>
    - <ISF 2.5.2 enterprise registry>/<ISF 2.5.2 target-path>
    source: cp.icr.io/cp/spectrum/scale
  - mirrors:
    - <ISF 2.4.0 enterprise registry>/<ISF 2.4.0 target-path>
    - <ISF 2.5.2 enterprise registry>/<ISF 2.5.2 target-path>
    source: icr.io/cpopen
  #for IBM Spectrum Fusion operator 
  - mirrors:
    - <ISF 2.4.0 enterprise registry>/<ISF 2.4.0 target-path>
    - <ISF 2.5.2 enterprise registry>/<ISF 2.5.2 target-path>
    source: cp.icr.io/cp/isf
  # for spp agent
  - mirrors:
    - <ISF 2.4.0 enterprise registry>/<ISF 2.4.0 target-path>/sppc
    - <ISF 2.5.2 enterprise registry>/<ISF 2.5.2 target-path>/sppc
    source: cp.icr.io/cp/sppc
  - mirrors:
    - <ISF 2.4.0 enterprise registry>/<ISF 2.4.0 target-path>/sppc
    - <ISF 2.5.2 enterprise registry>/<ISF 2.5.2 target-path>/sppc
    source: registry.redhat.io/amq7
  - mirrors:
    - <ISF 2.4.0 enterprise registry>/<ISF 2.4.0 target-path>/sppc
    - <ISF 2.5.2 enterprise registry>/<ISF 2.5.2 target-path>/sppc
    source: registry.redhat.io/oadp
  # for ose-kube-rbac-proxy
  - mirrors:
    - <ISF 2.4.0 enterprise registry>/<ISF 2.4.0 target-path>/openshift4/ose-kube-rbac-proxy
    - <ISF 2.5.2 enterprise registry>/<ISF 2.5.2 target-path>/openshift4/ose-kube-rbac-proxy
    source: registry.redhat.io/openshift4/ose-kube-rbac-proxy
  - mirrors:
    - <ISF 2.5.2 enterprise registry>/<ISF 2.5.2 target-path>/sppc/amq-streams-operator-bundle
    source: registry.redhat.io/amq7/amq-streams-operator-bundle
  - mirrors:
    - <ISF 2.5.2 enterprise registry>/<ISF 2.5.2 target-path>/sppc/oadp-operator-bundle
    source: registry.redhat.io/oadp/oadp-operator-bundle

Prechecks

User interface checks:
- Ensure that all compute nodes are in ready state on the OpenShift user interface as well as on the Nodes page of IBM Storage Fusion HCI System user interface.
- In the Infrastructure > Dashboard page of the IBM Storage Fusion HCI System user interface, check whether no nodes, disks, or switches are in critical state. In the Infrastructure > Network page, go to Switches, VLAN, and Links tabs to check their statuses.
- Go to Events page in the IBM Storage Fusion HCI System user interface and check whether there are any critical events.
- Ensure that you collect the logs before you upgrade. For more information, see Collecting log packages for IBM Storage Fusion HCI System.
  1. Collect the system health check logs before you upgrade the IBM Storage Fusion HCI System.
  2. Collect the Backup & Restore (Legacy) logs before you upgrade the Backup & Restore (Legacy).
  3. Collect the storage logs before you upgrade the Global Data Platform.

Run the following command to check whether all nodes are in Ready state with no unscheduable taint:

oc get nodes

Sample output:

NAME                                  STATUS   ROLES    AGE   VERSION

compute-1-ru23.example.domain.com   Ready    worker   34d   v1.23.17+16bcd69

compute-1-ru24.example.domain.com   Ready    worker   34d   v1.23.17+16bcd69

compute-1-ru5.example.domain.com    Ready    worker   34d   v1.23.17+16bcd69

compute-1-ru6.example.domain.com    Ready    worker   34d   v1.23.17+16bcd69

compute-1-ru7.example.domain.com    Ready    worker   34d   v1.23.17+16bcd69

control-1-ru2.example.domain.com    Ready    master   34d   v1.23.17+16bcd69

control-1-ru3.example.domain.com    Ready    master   34d   v1.23.17+16bcd69

control-1-ru4.example.domain.com    Ready    master   34d   v1.23.17+16bcd69

Run the following command to confirm whether the nodes in the machine config pool are not in degraded state:

oc get mcp

Example output:

NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-9bfe23f117352384b87e460ac8371323   True      False      False      3              3                   3                     0                      6d6h
worker   rendered-worker-d0fdbacd90381d7ebc7d44adaf7c8907   True      False      False      3              3                   3                     0                      6d6h
[root@roadierackd ~]#

In the output, check for the following values:

The values of READYMACHINECOUNT and UPDATEDMACHINECOUNT must be same as MACHINECOUNT.
The value of DEGRADED is False.
The value of Updated is True and Updating is False.

Verify whether the pods in the Fusion namespace and the namespaces of its services are healthy and in a running state. To verify, run the following command for each namespace:
```
oc get po -n <name of the namespace>
```
List of namespaces:
- ibm-spectrum-fusion-ns
- ibm-spectrum-scale
- ibm-spectrum-scale-operator
- ibm-spectrum-scale-dns
- ibm-spectrum-scale-csi
- ibm-spectrum-protect-plus-ns
- baas
- ibm-data-cataloging
- ibm-backup-restore
Note: The namespaces list vary based on the services you have installed.

Check for pods in each namespace using the following commands:

oc -n ibm-spectrum-fusion-ns get po

oc -n ibm-spectrum-scale get po

oc -n ibm-spectrum-scale-operator get po

oc -n ibm-spectrum-scale-dns get po

oc -n ibm-spectrum-scale-csi get po

oc -n ibm-spectrum-protect-plus-ns get po

oc -n baas get po

Sample output:

NAME                                                              READY   STATUS    RESTARTS      AGE

callhomeclient-68887645b8-78xc4                                   1/1     Running   0             10h

callhomeclient-68887645b8-gzhfr                                   1/1     Running   0             6d3h

eventmanager-5f6d458cf9-hsp7r                                     1/1     Running   0             10h

eventmanager-5f6d458cf9-z6g8r                                     1/1     Running   0             6d3h

grafana-deployment-6dcff5fd67-7dj42                               1/1     Running   0             6d3h

grafana-operator-controller-manager-649f7bbcbc-s699p              2/2     Running   0             6d3h

isf-application-operator-controller-manager-69589f8f8c-8fvrw      2/2     Running   0             6d3h

isf-bkprstr-operator-controller-manager-74c7757bf6-nbl5p          2/2     Running   3 (10h ago)   6d3h

isf-compute-operator-controller-manager-68cd6f658b-7kdg2          2/2     Running   2 (10h ago)   6d3h

isf-data-protection-operator-controller-manager-6b8dddf66-7g68g   2/2     Running   0             6d2h

isf-ics-operator-controller-manager-7cb7d6dc74-w5rhz              2/2     Running   0             10h

isf-metrodr-operator-controller-manager-78bd5dbf57-n72rl          2/2     Running   0             10h

isf-network-operator-controller-manager-947bc9f5c-f2zqk           2/2     Running   0             10h

isf-prereq-operator-controller-manager-7c4ff6c86-vdbkl            2/2     Running   2 (10h ago)   6d3h

isf-proxy-7b6dc8bf98-2xpk5                                        1/1     Running   0             6d3h

isf-proxy-7b6dc8bf98-4kkcc                                        1/1     Running   0             10h

isf-serviceability-operator-controller-manager-5578dc7d84-g2knl   2/2     Running   1 (11h ago)   6d3h

isf-storage-operator-controller-manager-777d465fbb-m4wvs          2/2     Running   4 (10h ago)   6d3h

isf-storage-service-dep-85c68d7b54-jppz9                          1/1     Running   0             6d3h

isf-ui-dep-845dd8554-bxshd                                        1/1     Running   0             6d3h

isf-ui-dep-845dd8554-st6s5                                        1/1     Running   0             10h

isf-ui-operator-controller-manager-77fb8fc9c4-w66qx               2/2     Running   2 (11h ago)   6d3h

isf-update-operator-controller-manager-fb8c656c9-qv5g4            2/2     Running   0             10h

logcollector-5b8659b8c7-6psvt                                     1/1     Running   0             6d3h

logcollector-5b8659b8c7-9rjqq                                     1/1     Running   0             10h

spp-dp-controller-manager-59f6dcdbdc-tbtmf                        2/2     Running   0             6d2h

trapserver-0                                                      1/1     Running   0             6d3h

Ensure that there are no catalogsource pod errors in openshift-markerplace. If errors are found, then fix them before you start the upgrade process. For more information, see Troubleshooting issues in IBM Storage Fusion HCI System.

Ensure that all persistent volumes and persistent volume claims are in a Bound state. Run the following command for Persistent Volumes:


oc get pv -n <namespace>

Run the following command for Persistent Volumes Claims:


oc get pvc -n <namespace>

For the namespaces, see namespace list.

Sample command and output:

oc get pvc -n ibm-spectrum-fusion-ns

NAMESPACE                NAME                                         STATUS   VOLUME                                           CAPACITY      ACCESS MODES   STORAGECLASS                  AGE

ibm-spectrum-fusion-ns   isf-bkprstr-claim                            Bound    pvc-9c25f079-2a5a-4fe1-b1fe-782a180e8a71         5Gi           RWX            ibm-spectrum-fusion-mgmt-sc   31d

ibm-spectrum-fusion-ns   logcollector                                 Bound    pvc-527a53f3-c5a9-4ab3-936c-196f34e94724         25Gi          RWX            ibm-spectrum-fusion-mgmt-sc   34d

Run the following command to check whether all cluster operators are available and their DEGRADED state is False, Available is True, and Progressing is False:
```
oc get co
```
Check whether all operators are in Succeeded state:
```
oc get csv -A|grep -v elastic
```

Ensure that the catalog sources are in Ready state.

Check the health of the catalog sources in OpenShift cluster. Run the following command to check all the catalog sources together:

oc get catsrc -A -o yaml |grep lastObservedState

f:lastObservedState: {}
lastObservedState: READY
     f:lastObservedState: {}
lastObservedState: READY
     f:lastObservedState: {}
lastObservedState: READY
     f:lastObservedState: {}
lastObservedState: READY
     f:lastObservedState: {}
lastObservedState: READY
     f:lastObservedState: {}
lastObservedState: READY
     f:lastObservedState: {}
lastObservedState: READY

For offline, check whether the ibm-opencloud and ibm-operator-catalog catalog source are in error state. If they are in an error state and not in use, then delete them.
- To validate if catalogsource are in use or not, run the following command to list all the subscriptions in IBM Storage Fusion HCI System cluster.
```
oc get sub -A
```
  If the catalog source name under source fields does not show ibm-opencloud and ibm-operator-catalog in the output, then it is not in use.
- Run oc get catsrc -A command to ensure the following catalog sources are present and in ready state:
  - community-operators
  - isf-catalog
  - redhat-operators
- Run the following command to check whether olm pod catalog-operator is running in openshift-operator-lifecycle-management project:
```
oc -n openshift-lifecycle-management get po
```

Ensure that IBM Spectrum Scale is healthy:

Run the following command to check whether the Scale CR (storagemanager) reports the cluster as healthy:

oc -n ibm-spectrum-fusion-ns get scales storagemanager -oyaml | grep storageClusterStatus

The output of the command must be healthy. If it shows as DEGRADED, then resolve the issue and then proceed with the upgrade.

Note: If the IBM Spectrum Scale is not healthy, do not initiate the upgrade. Check events to get further details about the problem. Contact IBM support in case the issue cannot be resolved.

Run the following commands to check the pod:

oc project ibm-spectrum-scale

oc rsh control-0

From the resultant pod, run the following commands:

mmlsmount all

mmlscluster

In this example output, the file system ibmspectrum-fs is mounted on 6 nodes. The number depends on the nodes in your cluster.

GPFS cluster information

========================

  GPFS cluster name:         ibm-spectrum-scale.example.domain.com

  GPFS cluster id:           6734170828145876673

  GPFS UID domain:           ibm-spectrum-scale.example.domain.com

  Remote shell command:      /usr/bin/ssh

  Remote file copy command:  /usr/bin/scp

  Repository type:           CCR

 Node  Daemon node name                                                               IP address      Admin node name                                                               Designation

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

   1   control-1-ru4.daemon.ibm-spectrum-scale.example.domain.com.   192.1.1.1  control-1-ru4.admin.ibm-spectrum-scale.example.domain.com.   quorum-manager-perfmon

   2   compute-1-ru5.daemon.ibm-spectrum-scale.example.domain.com.   192.1.1.2  compute-1-ru5.admin.ibm-spectrum-scale.example.domain.com.   quorum-manager-perfmon

   3   compute-1-ru6.daemon.ibm-spectrum-scale.example.domain.com.   192.1.1.3  compute-1-ru6.admin.ibm-spectrum-scale.example.domain.com.   quorum-manager-perfmon

   4   control-1-ru3.daemon.ibm-spectrum-scale.example.domain.com.   192.1.1.4  control-1-ru3.admin.ibm-spectrum-scale.example.domain.com.   quorum-manager-perfmon

   5   control-1-ru2.daemon.ibm-spectrum-scale.example.domain.com.   192.1.1.5  control-1-ru2.admin.ibm-spectrum-scale.example.domain.com.   quorum-manager-perfmon

   6   compute-1-ru7.daemon.ibm-spectrum-scale.example.domain.com.   192.1.1.6  compute-1-ru7.admin.ibm-spectrum-scale.example.domain.com.   quorum-manager-perfmon

Run the following command to ensure that IBM Spectrum Scale is healthy. Ensure that all node states are active. If any node state is down, then you need to bring it to an active state.

mmgetstate -a

Ensure node upsize and disk scale out are not initiated until upgrade is complete.
Make sure no operation is done on OpenShift Container Platform cluster that causes machine configuration rollout, for example:
- Node maintenance
- Node reboot
- Old firmware upgrade
- Image content source policy update
- Pull secret updates
Logs that you collected by using the IBM Storage Fusion Collect logs user interface page gets deleted after the upgrade process completes. Download the needed logs before you begin the upgrade. In addition, check system health from the log collections.