Serviceability known issues

List of all troubleshooting and known issues in Events, Log collection, and Call Home.

Unable to delete proxy config from IBM Fusion user interface

Problem statement
Unable to delete proxy configurations available on the IBM Fusion user interface.
Resolution
Follow the steps to resolve the issue:
  1. Important: Ensure that you disable the remote support connection before you update or delete the proxy configuration.
    Disable remote support connection from the IBM Fusion HCI System user interface as follows:
    1. From the IBM Fusion HCI System user interface, go to Settings > Support.

      The Support settings page gets displayed.

    2. In the Remote support section, click the toggle button to disable the remote support connection.
      Note: Disable the remote support connection only if it is enabled.
  2. Log in to the service node as a kni user.
  3. Remove the proxy credential from pass tool on the service node.
    pass rm -f remoteSupport
  4. Clean up the proxy details related to remote support from /home/kni/isf-ui/conf/proxydata.json on the service node.
    jq 'del(.remoteSupport)' /home/kni/isf-ui/conf/proxydata.json > temp.json && mv temp.json /home/kni/isf-ui/conf/proxydata.json
  5. Refresh the IBM Fusion HCI System or install user interface.

Events that go through the trap server do not get created in IBM Fusion

Problem statement
Sometimes, events that go through the trap server do not get created in IBM Fusion.
Resolution
If you see the following details in the trapserver logs, follow the steps as a workaround:
Listening for traps on 0.0.0.0:31620
 [fd8c:215d:178e:c0de:a94:efff:fef3:35cd]:41493 byte array parsed in is not a sequence
 [fd8c:215d:178e:c0de:a94:efff:fef3:3561]:35748 failed to parse sequence length0
 2021/10/13 14:20:48.124 [D] Recovered in resetServer, r=runtime error: slice bounds out of range [:-2592903709665718705]
Listening for traps on 0.0.0.0:31620
 [fd8c:215d:178e:c0de:a94:efff:fef3:3555]:58314 failed to parse sequence length0
 [fd8c:215d:178e:c0de:a94:efff:fef3:35cd]:34801 byte array parsed in is not a sequence
 [fd8c:215d:178e:c0de:a94:efff:fef3:35cd]:52070 byte array parsed in is not a sequence
 [fd8c:215d:178e:c0de:a94:efff:fef3:3555]:47601 parse error
 [fd8c:215d:178e:c0de:a94:efff:fef3:3585]:56764 length parse error @ idx 2
 [fd8c:215d:178e:c0de:a94:efff:fef3:3399]:44497 failed to parse sequence length0
 [fd8c:215d:178e:c0de:a94:efff:fef3:35cd]:57884 failed to parse sequence length0
 [fd8c:215d:178e:c0de:a94:efff:fef3:3585]:59094 length parse error @ idx 2
 [fd8c:215d:178e:c0de:a94:efff:fef3:3555]:40214 parse error
 [fd8c:215d:178e:c0de:a94:efff:fef3:3399]:37409 byte array parsed in is not a sequence
  • Restart the trapserver pod in the ibm-spectrum-fusion-ns.
  • Run the following command to delete all the ComputeMonitoring CRs that are present in the ibm-spectrum-fusion-ns namespace.
    oc delete cmo --all -n ibm-spectrum-fusion-ns
    Wait for the custom resource instances to get recreated.
  • Restart BMC of all the compute nodes by executing resetsp command from the BMC command line.

Log status shows complete for downloaded log file with 0-bytes size

Problem statement
The Log status shows complete for downloaded log file with of 0-bytes size. The status should be failed if logs are not collected, but it shows completed with 0-bytes size.
Cause
  1. Pods get evicted because of a lack of storage space.
  2. The ongoing jobs are taking time to get storage space, but if it takes more time, then they are marked as stale and automatically get cleaned up.
Resolution
There are two methods available to resolve this issue:
Method 1
  1. Delete the unnecessary logs through the IBM Fusion HCI System user interface to get storage space. For more information, see Delete a log package.
  2. Delete the on going jobs that are taking a long time and retry the job again.
Method 2
  • You can increase the log collector PVC size by following the steps:
    1. Log in to the Red Hat® OpenShift® Container Platform web console.
    2. Go to Storage > PersistentVoulmeClaims.

      The PersistentVolumeClaims page gets displayed.

    3. Select the log collecter PVC that you want to modify.
    4. Click the ellipsis icon and select Expand PVC.

      The Expand PersistentVolumeClaims page gets displayed.

    5. Set the PVC value that you want and click Expand.

Known issues

  • In the Logs page of the IBM Fusion user interface, if you select System Health Check option during log collection, then it takes longer than usual time to complete. It is observed that the log collection process might take 20 to 25 minutes. In some cases, it can be due to many directories and multiple IMM log file collection process.
  • For IBM Storage Scale warning events, the fixed status might be incorrect.
  • Sometimes, in the Events page, the Source column of the events list might be incorrect.
  • Events are not created for IBM Storage Scale events with entity_name fields that do not conform to the URL path components (^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?$).
  • Call Home ticket creation can have a failed state when the system is not entitled or the Call Home server does not respond with the ticket number. Click Verify connection to check the test connection.
  • To prevent a deadlock condition, the event manager must be restarted every 24 hours. Run the following command to restart the event manager:
    oc rollout restart deployment eventmanager
  • Sometimes, the automatic upload of logs might not happen and the Call Home would fail. In such cases, manually upload the logs.
  • If you power off a control node wherein the trap server pod is running, then the migration of trap server pods to a different node fails. As a result, the trap server pod may get stuck in the terminating state, and some SNMP trap events may fail to capture and display.
  • In rare scenarios, the Events page comes up as empty because of a backend error due to a load with a 504 Gateway error. Generally, this page comes up after sometime automatically as the system recovers, so please try again after sometime.
  • One possible reason is log collector pod runs out of space and no space is available on it. This can happen when too many logs are collected in short period. The solution is to delete already collected logs from Fusion UI which are no longer required.
    Workaround
    • From the title bar, click the help icon and select Support logs.
    • Identify and delete logs through ellipsis menu in the Support logs page.
  • If Data Foundation log collection request or requests for multiple log packages collected simultaneously fails or stuck for more than 6 hours, then update the pod memory to 12000MiB instead of 6000MiB in the log collector deployment at spec.containers[0].resources.limits.memory and then attempt the log collection again one by one.
  • Sometimes, an intermittent partial log collection error occurs for node log packages in case of multi-rack set up.
    Workaround