[OpenShift Container Platform][MQ 9.4.0 Jun 2024][Amazon EKS][MQ 9.4.0 Jun 2024]

Expanding persistent volumes

If your storage provider supports volume expansion, use this task to expand a persistent volume. Depending on the storage provider, expansion might occur online or offline.

Before you begin

Successful volume expansion relies on your storage provider to fulfill the expansion request. Refer to your storage providers documentation to determine if online resizing is supported, and for information about offline resizing procedures.

If your storage provider cannot fulfill the expansion request, your Persistent Volume Claim might enter a state with warnings or errors. If expansion fails, an administrator can manually recover the Persistent Volume Claim state and cancel the expansion.

About this task

To help with managing persistent storage, Kubernetes defines two API resources:
  • A PersistentVolume (PV), which is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes. It can be provisioned statically or dynamically.
  • A PersistentVolumeClaim (PVC), which is a request for storage by a user. It also acts as a claim check to the resource.
For more information, see Persistent Volumes in the Kubernetes documentation.
Warning:
  • If the storage class used to create queue manager PVCs does not support online resizing, offline resizing takes place. During offline resizing user intervention is required to complete volume expansion, so queue managers experience downtime.
  • For offline resizing of shared volumes for multi-instance queue managers, both active and standby pods must be brought down at the same time when performing the user intervention.
  • Kubernetes platforms, including Red Hat OpenShift, do not support reducing the size of PVCs. Attempting to reduce the size of persistent volumes will put the queue manager into a 'Failed' state.
  • This procedure does not apply to ephemeral volumes.

To expand a PV that is used by the IBM® MQ container, complete the following steps.

Procedure

  1. Prepare to expand volumes
    1. Decide which volumes to expand.
    2. Determine the storage class or classes being used by your volumes.
      For example:
      spec:
        queueManager:
          storage:
            persistedData:
              enabled: true
              type: persistent-claim
              class: ocs-storagecluster-cephfs (1)
            queueManager:
              type: persistent-claim
            recoveryLogs:
              enabled: true
              type: persistent-claim
            defaultClass: ocs-storagecluster-ceph-rbd (2)
      Notes:
      • (1) If the volume defines a specific storage class, then this is used by PVCs of this type.
      • (2) If defaultClass is set, this storage class is used for all volumes without a specific storage class. If defaultClass is not set, and a volume type has not specified a class, then the default storage class for the cluster is used.
      You can also confirm the storage class in use by describing the underlying PVCs. For example:
      • For deployments on the Red Hat OpenShift Container Platform:
        oc describe pvc <PVC_NAME>
        
      • For deployments on Amazon EKS:
        kubectl describe pvc <PVC_NAME>
    3. Validate that your storage class supports volume expansion.
      A storage class might have the property .allowVolumeExpansion defined:
      • If this property is set to true, then volume expansion is supported.
      • If this property is set to false, or this property is not defined, then the storage class does not allow volume expansion. In this case, refer to your storage provider documentation to see if this feature can be enabled.
      You can also describe a storage class to determine if it supports volume expansion. For example:
      • For deployments on the Red Hat OpenShift Container Platform:
        oc describe sc <STORAGE_CLASS_NAME>
        
      • For deployments on Amazon EKS:
        kubectl describe sc <STORAGE_CLASS_NAME>
    4. Refer to your storage provider documentation to see if an online or offline procedure is used for volume expansion.

      An offline procedure requires queue manager pods to be manually restarted, whereas an online procedure does not. Refer to your storage provider documentation for offline resizing procedures.

    5. Check if your queue manager has a status condition with the reason 'StorageMismatch'.

      If your queue manager has this status condition, the volumes listed in the condition are expanded if you enable volume expansion. If you do not want this to happen, change the size fields associated with each volume type in your queue manager definition to match the provisioned PVCs. The status condition is removed when this is done for all mismatched volumes.

  2. Expand volumes
    Warning:
    • If you have previously modified any of the volume size fields in your queue manager definition, volumes begin expanding when .allowVolumeExpansion is set to true in your queue manager definition.
    • Your storage provider might have restrictions on the maximum size of a volume because of file system limitations or availability of local hardware. To avoid failures, validate these limitations in your storage provider documentation before you expand volumes.
    • Reductions in PVC size are not supported by Kubernetes platforms, including Red Hat OpenShift. If you expand the size of a volume you cannot reduce it. If your attempt to do so fails, the IBM MQ Operator cannot return the PVC to its original state.

    Example queue manager definition illustrating volume expansion:

    spec:
      queueManager:
        storage:
          allowVolumeExpansion: true (A)
          persistedData:
            enabled: true
            type: persistent-claim
            size: 3Gi (B)
          queueManager:
            type: persistent-claim
            size: 4Gi (B)
          recoveryLogs:
            enabled: true
            type: persistent-claim
            size: 3Gi (B)
    1. To allow volume expansion for the queue manager, set the field .spec.queueManager.storage.allowVolumeExpansion (A) on your queue manager to true.
    2. You can now increase the size fields (B) for any of your enabled volume types. Applying these changes will start volume expansion.
  3. Validate that your PVCs have been resized.
    Notes:
    • Volume expansion can take some time. If validation is not successful the first time consider waiting a few minutes and validating again.
    • Volume expansion only completes without user action when an online resize is performed.
    • Some storage providers round up the storage size you have requested. The expanded volume should have the same or greater size than your request.
    1. Check your queue manager for status conditions. Refer to the following table for conditions, explanations, and suggested actions.
      Table 1. Status conditions for storage
      CONDITION MESSAGE EXPLANATION
      StorageMismatch Storage sizes defined in the QueueManager resource do not match the capacity of one or more provisioned PVCs [pvc-list]. AllowVolumeExpansion is set to false in the QueueManager resource so the MQ Operator will not attempt to reconcile these differences. Volume expansion does not occur because .allowVolumeExpansion has not been set to true in the queue manager definition.
      StorageExpansionPending Volume expansion is pending for the following PVCs [pvc-list] Volume expansion is still taking place. If this status condition persists for an extended period of time then follow the steps below to gather more information because an offline resize, or failure to resize, might be taking place.
      Failed There are many possible storage related messages which can created a 'Failed' status condition. For example: 'MQ Queue Manager failed to deploy: persistentvolumeclaims "<pvc>" is forbidden: only dynamically provisioned pvc can be resized and the storageclass the provisions the pvc must support resize.' If the queue manager has 'Failed' status conditions with text that refers to storage, refer to the message within the status condition. The example message given here is caused by using a storage class that does not support expansion.
    2. For each PVC that you have expanded, check that the capacity has increased to match or be greater than the value specified in the queue manager definition.

      HA queue managers might have multiple PVCs of each type. To get the capacity of a PVC, run the following command:

      • For deployments on the Red Hat OpenShift Container Platform:
        oc get pvc <PVC_NAME> -o template --template '{{.status.capacity.storage}}'
        
      • For deployments on Amazon EKS:
        kubectl get pvc <PVC_NAME> -o template --template '{{.status.capacity.storage}}'

    3. Check that the PVC does not have any status conditions or events that suggest a failed resize:
      • For deployments on the Red Hat OpenShift Container Platform:
        oc describe pvc <PVC_NAME>
        
      • For deployments on Amazon EKS:
        kubectl describe pvc <PVC_NAME>
      • Your PVC might have the status condition FileSystemResizePending with message 'Waiting for user to (re-)start a pod to finish file system resize of volume on node'. This status condition is raised for both online and offline resizes. For an online resize, this status condition disappears without user action after the online resize completes.
      • If your PVC has an event or status condition that indicates a failed resize, see Recovering from failure when expanding volumes in the Red Hat OpenShift documentation.
    4. Check that the queue manager pods do not have any status conditions or events that suggest a failed resize. For HA deployments, check each replica.
      • For deployments on the Red Hat OpenShift Container Platform:
        oc describe pod <QUEUE_MANAGER_POD>
        
      • For deployments on Amazon EKS:
        kubectl describe pod <QUEUE_MANAGER_POD>
      If your pod has an event or status condition that indicates a failed resize, you can attempt to recover from that failure. The error text might help you resolve the problem, or prevent the same problem occurring if you try to resize again after recovery.
  4. Restart pods when resizing offline

    If your storage provider uses an offline resizing procedure when expanding volumes, then for volume expansion to complete you need to restart the queue manager pods that mount the volumes being resized.

    For multi-instance queue managers the recovery logs and persisted data volumes are shared between both the active and standby pods. For resizing of these volumes to complete, bring down both pods at the same time.

    Refer to your storage provider documentation for their offline resizing procedure.