IBM Support

IBM Storage Fusion 2.8.0 Hotfix for the Metro-DR tiebreaker issues

General Page

In the latest IBM Storage Scale 5.2.0.0 build, the installation or upgrade operations on the Metro-DR setup resulted in an unknown tiebreaker status despite its successful addition to the cluster. As a result, the tiebreaker status populated incorrectly in the Metro-DR scale CR.

The IBM Storage Scale 5.2.0.0 packages impact the IBM Storage Fusion 2.8.0 installation and upgrade operations.

How to identify the probelm

Do the following the steps to identify the problem:

  1. Install IBM Storage Fusion 2.8.0 or upgrade to 2.8.0 version.
  2. Configure the tiebreaker.
  3. In the ibm-spectrum-scale namespace, check whether the tiebreaker is reachable from the scale core pods.
  4. Check the condition status in the Metro-DR scale CR. It shows the status of the tiebreaker as healthy or unknown.

Resolution

  1. Do the following steps to recover from the failure state:

    • Add the field enableManualInstallation: true to the scalemanager CR spec.

    • Replace the isf-storage-operator-controller-manager image with a new image in the installed operators.
      isf-storage-operator (HCI only) - cp.icr.io/cp/isf/isf-storage-operator@sha256:df694029549fce04cec5236832970c4a537b85eec6a7289e27a9ae21c9d4f5de

    • Replace the if-cns-operator image in the cr-version-cm config map of the ibm-spectrum-fuison-ns namespace.
      isf-cns-operator (HCI) - cp.icr.io/cp/isf/isf-cns-operator@sha256:5023fc0c078a832d8c14b5ab5c6a02e7c1f6370f9d39cb6567db7557fbe05a27
      isf-cns-operator
      (SDS) - cp.icr.io/cp/isf-sds/isf-cns-operator@sha256:734d34758f579fcbb22a15500fc40cbcf84d42fba32e0d9f233c263a7df3a37d

    • Replace isf-storage-service image in cr-version-cm config map of ibm-spectrum-fuison-ns namespace.
      isf-storage-services (HCI only) - cp.icr.io/cp/isf/isf-storage-services@sha256:5d980a0d311b6eccfbd9fddb5af07395bbe52c066c925e4542e281f5045b9c53

    • Run the following commands to make sure that all the pods are running.
      oc get pods -n ibm-spectrum-fusion-ns | grep isf-storage-service-dep
      oc get pods -n ibm-spectrum-fusion-ns | grep isf-storage-operator-controller-manager
      oc get pods -n ibm-spectrum-fusion-ns | grep isf-cns-operator-controller-manager  


      Example ouput:
      ~ % oc get pods -n ibm-spectrum-fusion-ns | grep isf-storage-service dep
      isf-storage-service-dep-6c874667f-km4xv                           1/1     Running   0          86m
      ~ % oc get pods -n ibm-spectrum-fusion-ns | grep isf-storage-operator-controller-manager
      isf-storage-operator-controller-manager-86ccc69c4d-hjh65          2/2     Running   0          87m
      ~ % oc get pods -n ibm-spectrum-fusion-ns | grep isf-cns-operator-controller-manager    
      isf-cns-operator-controller-manager-6748dcd68f-ndrjt              2/2     Running   0          87m
      
  2. Run the following command to set the trigger update to true in the scale CR.

    oc patch Scale storagemanager -n ibm-spectrum-fusion-ns --type='json' -p='[{"op": "replace", "path": "/spec/triggerUpdate", "value": true}]'
     
  3. After a new isf-storage-service-dep pod comes up with the latest image, run the command curl -k https://isf-scale-svc/api/v1/upgradeWithOperator in the isf-storage-service-dep pod terminal.

    ~ % oc project ibm-spectrum-fusion-ns
    Now using project "ibm-spectrum-fusion-ns"
    ~ % oc rsh <isf-storage-service-dep pod name>
    sh-4.4# curl -k https://isf-scale-svc/api/v1/upgradeWithOperator
    {"status":"Deployed ECE and CSI upgrade yaml files on OCP cluster successfully"}sh-4.4# exit
    exit
    
  4. Verify the upgrade status as follows:

    • In the ibm-spectrum-scale-operator namespace, verify whether the coreECE image is updated in the ibm-spectrum-scale-manager-config configmap.

      coreECE: cp.icr.io/cp/spectrum/scale/erasure-code/ibm-spectrum-scale-daemon@sha256:9adcab69b470572b1dd3ef2d965d9e3873165612ac3b8e089374d8d53d979841
    • In the ibm-spectrum-scale-operator namespace, verify whether the new pod is in running state.

    • In the ibm-spectrum-scale-csi namespace, verify whether the new pod is in running state.

  5. Monitor the upgrade status as follows:

    • Nodes start rebooting one by one after a successful patching.

    • The new scale pods come up in the ibm-spectrum-scale namespace after all the nodes are restarted.

    • Run the following command to get the pods details in the ibm-spectrum-scale namespace.

      oc get pods -n ibm-spectrum-scale
  6. Run the following command to login to any scale core pod.

    oc rsh <podname>
  7. Run the following command to check the scale service state on all scale core pods.

    mmgetstate -a
  8. Run the following command to check whether the filesystem is mounted on all the nodes.

    mmlsmount all

    Note: 
    This Hotfix needs to be applied on both the racks one after another with the same instructions.
  9. Run the following command to verify tie breaker healthy condition in the status of the Metro-DR scale CR.

    oc get stretchcluster metrodr-scale -o yaml > stretchcluster.yaml

    For offline mirroring, add the above required images to your offline registry along with the IBM Storage Scale 5.2.0.1 images. For IBM Storage Scale 5.2.0.1 images, contact IBM support.

[{"Type":"MASTER","Line of Business":{"code":"LOB69","label":"Storage TPS"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSXEFDS","label":"IBM Storage Fusion HCI Appliance Software"},"ARM Category":[{"code":"a8m3p0000000rX7AAI","label":"HW"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
27 June 2024

UID

ibm17157093