Troubleshooting Hosted Control Plane cluster issues
Use these troubleshooting information to know the problem and workaround for the installation of spoke cluster and
Unable to delete internal and external bare metal Hosted Control Plane clusters using HCP CLI of Red Hat
- Problem statement
- When you use the HCP CLI, the hosted cluster does not get deleted. The cluster remains in a "destroying" state for over an hour.
- Resolution
- Use the Red Hat® Advanced Cluster Management for Kubernetes user interface to destroy a cluster, instead of the HCP CLI.
Hosted Control Plane in Red Hat documentation
For Hosted Control Plane troubleshooting documentation by Red Hat, see Red Hat Documentation.
PVCs on Hosted Control Plane clusters are stuck in terminating state
If PVCs on the Hosted Control Plane clusters are stuck in terminating state, then do the following steps for all applications that have PVCs associated to Fusion Data Foundation storage.
- Scale down deployment of the PVC application to 0.
- Delete application.
- Check the PVC status to confirm whether it is deleted.Note: Remove all PVCs from the applications before you remove the Fusion Data Foundation label.
- Check whether the Fusion Data Foundation client on the Hosted Control Plane cluster is deleted.
Issues in hosted cluster clean up
- Issues in Fusion base cleanup
-
If the hosted cluster does not clean up after you remove the base label (
isf.ibm.com/fusion-base
), then give it some time for the changes to take effect. If it still does not clean up, follow these steps to understand the cause of the issue:- Log in to the hosted cluster.
- Go to pods in the
open-cluster-management-agent-addon
namespace. - Get the terminal access to the
fusion-cleanup-agent
pod. - Run the following cleanup script:
bash /scripts/sds/sds-cleanup.sh
- If the script stuck at any resource deletion, remove the finalizer from that resource.
- Issues in Fusion Data Foundation service clean up
- Before you begin:
- Before you proceed with Fusion Data Foundation deletion, stop the backup service and clean up the associated PVCs.
- If Fusion Data Foundation sizes are large and workloads are
running continuously, such as Cloud Pak or Watsonx, then do the following steps:
- Scale the pods down to zero.
- Stop the application.
- Delete the associated PVCs.
If the Fusion Data Foundation service does not clean up after you remove its label (isf.ibm.com/fusion-fdf
), then give it some time for the changes to take effect. If it still does not clean up, follow these steps to understand the cause of the issue:- Log in to the hosted cluster.
- Go to pods in the
open-cluster-management-agent-addon
namespace. - Get the terminal access to the
fusion-odf-cleanup-agent
pod. - Run the following cleanup script:
bash /scripts/odf/delete-fusion-odf.sh
- If the script stuck at any resource deletion, remove the finalizer from that resource.
- Issues in Backup & Restore service clean up
-
If the Backup & Restore service does not clean up after you remove its label (
isf.ibm.com/fusion-backup
), then give it some time for the changes to take effect. If it still does not clean up, follow these steps to understand the cause of the issue:- Log in to the hosted cluster.
- Go to pods in the
open-cluster-management-agent-addon
namespace. - Get the terminal access to the
fusion-bnr-cleanup-agent
pod. - Run the following cleanup script:
bash /scripts/backup-restore/uninstall-backup-restore.sh
- If the script stuck at any resource deletion, remove the finalizer from that resource.
Pull image issue occurs during the installation of IBM Fusion in a hosted cluster
- Problem statement
- If a pull image issue occurs, ensure that the pull-secret for
cp.icr.io
is available to the Hosted Control Plane cluster.
- Resolution
- To check and update the pull-secret, do the following steps:
- On the IBM Fusion HCI System hub cluster, go to and search for HostedCluster.
- Look for the instance that corresponds with the Hosted Control Plane cluster and open it in YAML view.
- Search for
pullSecret
and note down the name of the secret. - In the clusters namespace, search for that secret and open it in edit mode.
- Check to see whether the correct value of
cp.icr.io
available. If not, modify the secret with the right value.It can take a while to propagate. You may even have to approve the changes from the Hosted Control Plane cluster.
page of the
It is recommended to install a Hosted Control Plane
cluster with the following credentials:
cloud.openshift.com
cp.icr.io
quay.io
registry.connect.redhat.com
registry.redhat.io
Issues in installation of IBM Fusion in a hosted cluster
- Resolution
-
- Check the status of
manifestwork
fusion-install
inclusternamespace
of the hub cluster:oc login to the hub oc get manifestwork fusion-install -n <spoke_cluster_name> -o yaml
- If no error exists in the status of the
manifestwork
, check the status of IBM Fusion installation in the spoke cluster:oc login to the spoke oc get csv -n ibm-spectrum-fusion-ns
- Check the status of
Issues in the installation of Fusion Data Foundation in the hosted cluster
- Resolution
- As a resolution, check the status of
manifestwork
odfclient-install
inclusternamespace
in the hub cluster:oc login to the hub oc get manifestwork odfclient-install -n <spoke_cluster_name> -o yaml
Backup issues in Hosted Control Plane with Fusion Data Foundation
- Problem statement
- Concurrent backups fail during a Velero snapshot with the following error message:
The operation has timed out because Velero has failed to report status.' However, the backup phase is updated as 'DataTransferfailed.
The timeout for Velero to pickup request is set to 30 minutes in the transaction manager.
- Resolution
- To resolve the issue, increase it to 60 minutes.
Hosted Control Plane cluster does not get created because of unavailability of IP addresses
- Resolution
- Check whether the installed load balancer has sufficient IP addresses for the Hosted Control Plane cluster. Check the
IPAddressPool
object for the IP range. Run the following command to check whether an IP is available:oc get svc -A | grep LoadBalancer
Known issues and limitations
- Random "image pull" failure can occur on the Hosted Control Plane due to secrets.
- Sometimes, the hosted cluster status goes in offline mode after deletion. Contact IBM Support to resolve the issue.
- The hcp destroy command can cause the hosted cluster to get stuck indefinitely during cleanup. Contact IBM Support to resolve the issue.
- The YAML tab view on the OpenShift® Container Platform console is not working as expected during the installation of the multi cluster engine operator.
- The hosted clusters expect the
storageprofile
of the used storage class, but it is not available during cluster creation. - The following issues occur whenever you remove the disks and place them back in the rack:
- Disks are not reflected in the node.
- LVM cluster and Data Foundation go into a degraded state.