Fix Readme
Abstract
Global Data Platform (GDP) upgrade is stuck at 95% for more than 15 minutes and Global Data Platform (GDP) storage configuration is stuck at 60% for more than 20 minutes even though the standalone cluster containing both the storage nodes and the non-storage nodes.
Content
How to identify the problem:
-
Retrieve daemon CR details using the following command to get the status of roles and versions in the cluster.
oc get daemon ibm-spectrum-scale -n ibm-spectrum-scale -oyaml - Check role status from the daemon CR and ensure that the runningCount, podCount of the storage role matches the nodeCount.
Sample output:roles:
- name: storage
nodeCount: "6"
nodes: compute-1-ru5.hciocp1.ssclab.ibm.si, compute-1-ru6.hciocp1.ssclab.ibm.si, compute-1-ru7.hciocp1.ssclab.ibm.si, control-1-ru2.hciocp1.ssclab.ibm.si, control-1-ru3.hciocp1.ssclab.ibm.si, control-1-ru4.hciocp1.ssclab.ibm.si
podCount: "6"
pods: compute-1-ru5, compute-1-ru6, compute-1-ru7, control-1-ru2, control-1-ru3, control-1-ru4
runningCount: "6"
- name: client
nodeCount: "2"
nodes: compute-1-ru8.hciocp1.ssclab.ibm.si, compute-1-ru9.hciocp1.ssclab.ibm.si
podCount: "2"
pods: compute-1-ru8, compute-1-ru9
runningCount: "2"
- name: afm
nodeCount: "2"
nodes: compute-1-ru23.hciocp1.ssclab.ibm.si, compute-1-ru24.hciocp1.ssclab.ibm.si
podCount: "2"
pods: compute-1-ru23, compute-1-ru24 runningCount: "2" -
Check the versions in the daemon CR.
Sample output:
versions:
- count: "10"
pods: control-1-ru2, control-1-ru3, control-1-ru4, compute-1-ru23, compute-1-ru8, compute-1-ru9, compute-1-ru5, compute-1-ru6, compute-1-ru7, compute-1-ru24
version: 5.2.1.1 - For a standalone cluster, ensure that the versions field should contain a single entry (i.e., versions[0]), and the following conditions must be met:
- The versions array should have only one set of values, confirming that all pods are updated and running the same version.
- status.versions[0].count should be greater than the running count of the storage role.
- status.versions[0].pods should contain both storage and non-storage pods, meaning all pods listed under roles (e.g., both storage and other roles) should be included.
- If status.versions[0].count is greater than the runningCount of the storage role, this can result in:
-
The GDP upgrade process getting stuck at 95%.
-
The GDP configuration process stalling at 60%.
-
Resolution:
- Replace the isf-storage-operator-controller-manager image with a new image in the installed operator CSV.
isf-storage-operator - cp.icr.io/cp/fusion-hci/isf-storage-operator@sha256:d83031d644a7830d3b37e10542f200e3a353a23b411ae6751af5d83fc95d22d6 - Run the following commands to make sure that all the pods are running.
oc get pods -n ibm-spectrum-fusion-ns | grep isf-storage-operator-controller-manager
Sample output:
oc get pods -n ibm-spectrum-fusion-ns | grep isf-storage-operator-controller-manager
isf-storage-operator-controller-manager-86ccc69c4d-hjh65 2/2 Running 0 87m - If the issue is with configuring the GDP, then monitor the GDP configuration percentage using the following command.
oc get Scale storagemanager -n ibm-spectrum-fusion-ns -o jsonpath='{.status.installProgressStatus}' - If the issue is with upgrading the GDP, then monitor the GDP upgrade percentage on IBM Fusion user interface .
Was this topic helpful?
Document Information
Modified date:
03 March 2025
UID
ibm17184374