Troubleshooting Hosted Control Plane cluster issues

Use these troubleshooting information to know the problem and workaround for the installation of spoke cluster and

Unable to delete internal and external bare metal Hosted Control Plane clusters using HCP CLI of Red Hat

Problem statement
When you use the HCP CLI, the hosted cluster does not get deleted. The cluster remains in a "destroying" state for over an hour.
Resolution
Use the Red Hat® Advanced Cluster Management for Kubernetes user interface to destroy a cluster, instead of the HCP CLI.

Hosted Control Plane in Red Hat documentation

For Hosted Control Plane troubleshooting documentation by Red Hat, see Red Hat Documentation.

PVCs on Hosted Control Plane clusters are stuck in terminating state

If PVCs on the Hosted Control Plane clusters are stuck in terminating state, then do the following steps for all applications that have PVCs associated to Fusion Data Foundation storage.

  1. Scale down deployment of the PVC application to 0.
  2. Delete application.
  3. Check the PVC status to confirm whether it is deleted.
    Note: Remove all PVCs from the applications before you remove the Fusion Data Foundation label.
  4. Check whether the Fusion Data Foundation client on the Hosted Control Plane cluster is deleted.

Issues in hosted cluster clean up

Issues in Fusion base cleanup
If the hosted cluster does not clean up after you remove the base label (isf.ibm.com/fusion-base), then give it some time for the changes to take effect. If it still does not clean up, follow these steps to understand the cause of the issue:
  1. Log in to the hosted cluster.
  2. Go to pods in the open-cluster-management-agent-addon namespace.
  3. Get the terminal access to the fusion-cleanup-agent pod.
  4. Run the following cleanup script:
    bash /scripts/sds/sds-cleanup.sh
  5. If the script stuck at any resource deletion, remove the finalizer from that resource.
Issues in Fusion Data Foundation service clean up
Before you begin:
  • Before you proceed with Fusion Data Foundation deletion, stop the backup service and clean up the associated PVCs.
  • If Fusion Data Foundation sizes are large and workloads are running continuously, such as Cloud Pak or Watsonx, then do the following steps:
    1. Scale the pods down to zero.
    2. Stop the application.
    3. Delete the associated PVCs.
If the Fusion Data Foundation service does not clean up after you remove its label (isf.ibm.com/fusion-fdf), then give it some time for the changes to take effect. If it still does not clean up, follow these steps to understand the cause of the issue:
  1. Log in to the hosted cluster.
  2. Go to pods in the open-cluster-management-agent-addon namespace.
  3. Get the terminal access to the fusion-odf-cleanup-agent pod.
  4. Run the following cleanup script:
    bash /scripts/odf/delete-fusion-odf.sh
  5. If the script stuck at any resource deletion, remove the finalizer from that resource.
Issues in Backup & Restore service clean up
If the Backup & Restore service does not clean up after you remove its label (isf.ibm.com/fusion-backup), then give it some time for the changes to take effect. If it still does not clean up, follow these steps to understand the cause of the issue:
  1. Log in to the hosted cluster.
  2. Go to pods in the open-cluster-management-agent-addon namespace.
  3. Get the terminal access to the fusion-bnr-cleanup-agent pod.
  4. Run the following cleanup script:
    bash /scripts/backup-restore/uninstall-backup-restore.sh
  5. If the script stuck at any resource deletion, remove the finalizer from that resource.

Pull image issue occurs during the installation of IBM Fusion in a hosted cluster

Problem statement
If a pull image issue occurs, ensure that the pull-secret for cp.icr.io is available to the Hosted Control Plane cluster.
Resolution
To check and update the pull-secret, do the following steps:
  1. On the IBM Fusion HCI System hub cluster, go to Administration > CustomResourceDefinition and search for HostedCluster.
  2. Look for the instance that corresponds with the Hosted Control Plane cluster and open it in YAML view.
  3. Search for pullSecret and note down the name of the secret.
  4. In the clusters namespace, search for that secret and open it in edit mode.
  5. Check to see whether the correct value of cp.icr.io available. If not, modify the secret with the right value.

    It can take a while to propagate. You may even have to approve the changes from the Compute > Nodes page of the Hosted Control Plane cluster.

It is recommended to install a Hosted Control Plane cluster with the following credentials:
  • cloud.openshift.com
  • cp.icr.io
  • quay.io
  • registry.connect.redhat.com
  • registry.redhat.io

Issues in installation of IBM Fusion in a hosted cluster

Resolution
  1. Check the status of manifestwork fusion-install in clusternamespace of the hub cluster:
    
    oc login to the hub
    oc get manifestwork fusion-install -n <spoke_cluster_name> -o yaml
  2. If no error exists in the status of the manifestwork, check the status of IBM Fusion installation in the spoke cluster:
    
    oc login to the spoke
    oc get csv -n ibm-spectrum-fusion-ns

Issues in the installation of Fusion Data Foundation in the hosted cluster

Resolution
As a resolution, check the status of manifestwork odfclient-install in clusternamespace in the hub cluster:

oc login to the hub
oc get manifestwork odfclient-install -n <spoke_cluster_name> -o yaml

Backup issues in Hosted Control Plane with Fusion Data Foundation

Problem statement
Concurrent backups fail during a Velero snapshot with the following error message:
The  operation has timed out because Velero has failed to report status.'
      However, the backup phase is updated as 'DataTransferfailed.

The timeout for Velero to pickup request is set to 30 minutes in the transaction manager.

Resolution
To resolve the issue, increase it to 60 minutes.

Hosted Control Plane cluster does not get created because of unavailability of IP addresses

Resolution
Check whether the installed load balancer has sufficient IP addresses for the Hosted Control Plane cluster. Check the IPAddressPool object for the IP range. Run the following command to check whether an IP is available:
oc get svc -A | grep LoadBalancer

Known issues and limitations

  • Random "image pull" failure can occur on the Hosted Control Plane due to secrets.
  • Sometimes, the hosted cluster status goes in offline mode after deletion. Contact IBM Support to resolve the issue.
  • The hcp destroy command can cause the hosted cluster to get stuck indefinitely during cleanup. Contact IBM Support to resolve the issue.
  • The YAML tab view on the OpenShift® Container Platform console is not working as expected during the installation of the multi cluster engine operator.
  • The hosted clusters expect the storageprofile of the used storage class, but it is not available during cluster creation.
  • The following issues occur whenever you remove the disks and place them back in the rack:
    • Disks are not reflected in the node.
    • LVM cluster and Data Foundation go into a degraded state.
    As a resolution, restart the affected node.