IBM Storage Fusion Backup & Restore (Legacy) issues

List of known Backup & Restore (Legacy) issues and limitations in IBM Storage Fusion.

  • Service protection can be configured on one cluster with both application and service backups. You may use the same bucket on cloud storage to configure service protection on a second cluster to restore service and application backups from the first cluster. Backups that no longer exist on cloud storage appear on the user interface, and a failure occurs during restore attempts of those backups.
    1. If the first cluster remains as is, the retention period on the original backups may expire, and backups get removed from cloud storage. As the second cluster is unaware of the removal, it can attempt to remove the restored backups. The attempt fails because the backup on cloud storage no longer exists.
    2. If you uninstall the Backup & Restore service from the first cluster, use the -s option to prevent DeleteBackupRequest CRs from getting created. If you do not set this option, the backups on cloud storage get removed, and the second cluster is again unaware that they no longer exist on the cloud storage.

    The first deployment must not exist during the configuration of the second cluster.

  • Whenever the "IBM Spectrum Protect Plus license expired" error occurs, do the following steps to fix the license issue:
    1. Log in to IBM Spectrum Protect Plus by using your spp-connection secret values. For the procedure to login, see Logging into IBM Spectrum Protect Plus.
    2. If you get a license expired error, then retrieve the license file /spp/server/SPP.lic from isf_bkprstr operator pod using the oc command.
      See the following sample oc command:
      oc cp isf-bkprstr-operator-controller-manager-<podname>:/spp/server/SPP.lic SPP.lic
      Replace <podname> with your available podname. For example:
      <Podname>:isf-bkprstr-operator-controller-manager-599dc5b756-vcjd6
    3. Copy the license and upload it from the user interface. For more details, see Uploading the product key.
  • If you restore an application to a new namespace and the original application is still running, then some of your pods may not come up.
    Cause
    Check whether conflict of resources exist. For example, IP address or port with the original application.
    Resolution
    To resolve the issue, reconfigure the restored application so that the pods can come up without any conflicts.
  • If IBM Spectrum Protect Plus agent (baas) upgrade fails, then run the following command to delete the Kafka pod from the OpenShift® Container Platform console:
    oc delete Kafka baas -n baas
  • Application becomes unresponsive when you create multiple locations
    This issue occurs whenever the container reaches its CPU and memory limits. Increase the CPU and memory limits and check whether the isf-ui-dep pod is still crashing or not. To change the limits of CPU and memory, update the UI operator code to increase it.

    For example, cpu: 500m memory: 500Mi.

  • If a Backup & Restore (Legacy) job fails to start and goes into aborted status, then as a resolution restart IBM Spectrum Protect Plus Virgo pod from the IBM Spectrum Protect Plus user interface:
    1. Go to OpenShift Container Platform web management console.
    2. Go to Workloads > Pods.
    3. Select ibm-spectrum-protect-plus-ns project.
    4. Search for the sppvirgo pod.
    5. From the Actions menu, click Delete pod to re-spin it.
  • If IBM Storage Fusion is configured in an HTTP proxy environment, then defining an Object Storage Backup Storage Location that requires a proxy fails.
    Cause
    The IBM Spectrum Protect Plus does not support HTTP proxy.
    Resolution
    As a workaround, define a backup storage location in a transparent proxy mode.
  • In cases where the retention period for a backup expires and the backup does not get deleted from the object storage in the subsequent maintenance cycle, delete it manually.
  • If you delete a backup policy that is associated with an application, it gets unassigned from the application but does not get deleted. To delete such a policy, first remove the assignment of the backup policy from the application and then delete the policy.
  • Backup & Restore (Legacy) backup jobs fail to retrieve the output files from baas-rest-spp-agent.baas.svc.
    Cause
    Operations start to fail in the inventory phase when baas-spp-agent pod memory usage goes above 2450 MiB, closer to pod default limit of 2500MiB.
    Workaround
    Increase the amount of memory available for baas-spp-agent pod from 2500 MiB to 5000 MiB by adding the sppagent section to the IBMSPPC object:
    1. Use this command to obtain the correct value of sppagent.image digest.
      oc describe deployment.apps/baas-spp-agent -n baas | grep Image
    2. Edit IBMSPPC:
      oc edit IBMSPPC -n baas
      Sample YAML:
      
      sppagent:
          image:
            digest: sha256:3c32e1534118abe8f2b0ed7e058a81568d03c7cc5a3e07ddb6031c9de9c5bd3c
            name: baas-spp-agent
            pull_policy: Always
          replica_count: 1
          resources:
            limits:
              cpu: "3"
              ephemeral_storage: 20Gi
              memory: 5000Mi
            requests:
              cpu: "2"
              ephemeral_storage: 10Gi
              memory: 1250Mi
          rest_server_service:
            name: baas-rest-spp-agent
            port: 443
            port_name: rest-server
            target_port: 12345
          snapshot_restore_job_time_limit: 24
      
  • A "Failed restore snapshot" error occurs with applications using IBM Spectrum Scale storage PVCs.
    Cause
    The "disk quota exceeded" error occurs whenever you restore from an object storage location having applications that use IBM Spectrum Scale PVC with a size less than 5 GB.
    Resolution
    Increase the IBM Spectrum Scale PVC size to a minimum of 5 GB and do a backup and restore operation.
  • Sometimes, you may observe the following error message:
    "exec <executable name>": exec format error
    For example:
    The pod log is empty except for this message: exec /filebrowser 
    The example error can be due to the wrong architecture of the container. For example, an amd64 container on s390x nodes or an s90x container on amd64 nodes. As a resolution, check whether the container that you want to restore and the local node architecture match.
  • Backups involving PVCs with field spec.volumeMode.
    Cause
    File system that is never attached to a pod in a running state fails. The failure is due to the volume is not formatted with a filesystem such as ext4, xfs, btrfs, or any other.
    The list of reasons for the snapshot would fail:
    1. The PVC that has never attached to a running container of a pod. PVCs that used to be attached to a running container of a pod but not at the time of backup is not affected.
    2. The PVC lacks any data because it is not used.
    Note: PVCs with volumeMode Block is not affected by this limitation.
    Resolution
    As a workaround, follow the steps to resolve this issue:
    1. Delete the offending PVC.
    2. Create a Recipe for the application and exclude the PVC from the VolumeGroups. For more information about creating a Recipe, see Creating a Recipe.

Known issues

  • Whenever the Virgo pod gets restarted, which usually takes a long time, restart all other pods.
  • Backup policies with custom frequency only run on the earliest scheduled date. This behavior applies to both new policies and those defined in previous versions before the upgrade.
  • When you restore backup to the same namespace, existing resources are not deleted or overwritten.
  • In the Storage tab of the application details page, the values of Used and Capacity might not display the correct values.
  • It is not possible to assign more than one policy to an application. However, if you upgraded from a previous version and had multiple policies that are assigned to the same application before the upgrade, then you might still be affected by this issue.
  • When you change the retention period for backups, it gets set for future backups. The expiration value for existing backups remain the same as the settings done during the backup operation.
  • The restore CR exists on the OpenShift even after the retention period of the backups expire.
  • When you backup an application present in namespace with high security context constraints privileges. The restored namespace does not have the same security context constraints privileges, resulting in restored pods in crashloopbackoff status.
    Resolution:
    • Restart the application pod.
  • Sometimes, backups do not work as defined in the backup policies, especially when you set hourly policies. For example, if you set a policy for two hours and it does not run every two hours, then there are gaps in the backup history. The possible reason might be that when a pod crashed and restarted, jobs scheduled were not accounting for the time zone that causes gaps in run intervals.
    The following are the observed symptoms:
    • Policies with custom every X hour at minutes YY schedules: the first scheduled run of this policy runs at minutes YY after X hours with a time zone offset from UTC instead of at minutes YY after X hours.
    • Monthly and yearly policies run more frequently.

    As a resolution, start backups manually until the next scheduled time.