Serviceability known issues
List of all troubleshooting and known issues in Events, Log collection, and Call Home.
Unable to delete proxy config from IBM Fusion user interface
- Problem statement
- Unable to delete proxy configurations available on the IBM Fusion user interface.
- Resolution
- Follow the steps to resolve the issue:
- Important: Ensure that you disable the remote support connection before you update or delete the proxy configuration.Disable remote support connection from the IBM Fusion HCI System user interface as follows:
- From the IBM Fusion HCI System user interface, go to
.
The Support settings page gets displayed.
- In the Remote support section, click the toggle button to disable the
remote support connection.Note: Disable the remote support connection only if it is enabled.
- From the IBM Fusion HCI System user interface, go to
.
- Log in to the service node as a
kni
user. - Remove the proxy credential from pass tool on the service
node.
pass rm -f remoteSupport
- Clean up the proxy details related to remote support from
/home/kni/isf-ui/conf/proxydata.json
on the service node.jq 'del(.remoteSupport)' /home/kni/isf-ui/conf/proxydata.json > temp.json && mv temp.json /home/kni/isf-ui/conf/proxydata.json
- Refresh the IBM Fusion HCI System or install user interface.
Events that go through the trap server do not get created in IBM Fusion
- Problem statement
- Sometimes, events that go through the trap server do not get created in IBM Fusion.
- Resolution
- If you see the following details in the trapserver logs, follow the steps
as a workaround:
Listening for traps on 0.0.0.0:31620 [fd8c:215d:178e:c0de:a94:efff:fef3:35cd]:41493 byte array parsed in is not a sequence [fd8c:215d:178e:c0de:a94:efff:fef3:3561]:35748 failed to parse sequence length0 2021/10/13 14:20:48.124 [D] Recovered in resetServer, r=runtime error: slice bounds out of range [:-2592903709665718705] Listening for traps on 0.0.0.0:31620 [fd8c:215d:178e:c0de:a94:efff:fef3:3555]:58314 failed to parse sequence length0 [fd8c:215d:178e:c0de:a94:efff:fef3:35cd]:34801 byte array parsed in is not a sequence [fd8c:215d:178e:c0de:a94:efff:fef3:35cd]:52070 byte array parsed in is not a sequence [fd8c:215d:178e:c0de:a94:efff:fef3:3555]:47601 parse error [fd8c:215d:178e:c0de:a94:efff:fef3:3585]:56764 length parse error @ idx 2 [fd8c:215d:178e:c0de:a94:efff:fef3:3399]:44497 failed to parse sequence length0 [fd8c:215d:178e:c0de:a94:efff:fef3:35cd]:57884 failed to parse sequence length0 [fd8c:215d:178e:c0de:a94:efff:fef3:3585]:59094 length parse error @ idx 2 [fd8c:215d:178e:c0de:a94:efff:fef3:3555]:40214 parse error [fd8c:215d:178e:c0de:a94:efff:fef3:3399]:37409 byte array parsed in is not a sequence
- Restart the trapserver pod in the
ibm-spectrum-fusion-ns
. - Run the following command to delete all the
ComputeMonitoring
CRs that are present in theibm-spectrum-fusion-ns
namespace.
Wait for the custom resource instances to get recreated.oc delete cmo --all -n ibm-spectrum-fusion-ns
- Restart BMC of all the compute nodes by executing
resetsp
command from the BMC command line.
- Restart the trapserver pod in the
Log status shows complete for downloaded log file with 0-bytes size
- Problem statement
- The Log status shows complete for downloaded log file with of 0-bytes size. The status should be failed if logs are not collected, but it shows completed with 0-bytes size.
- Cause
-
- Pods get evicted because of a lack of storage space.
- The ongoing jobs are taking time to get storage space, but if it takes more time, then they are marked as stale and automatically get cleaned up.
- Resolution
- There are two methods available to resolve this issue:
Known issues
- In the Logs page of the IBM Fusion user interface, if you select System Health Check option during log collection, then it takes longer than usual time to complete. It is observed that the log collection process might take 20 to 25 minutes. In some cases, it can be due to many directories and multiple IMM log file collection process.
- For IBM Storage Scale warning events, the fixed status might be incorrect.
- Sometimes, in the Events page, the Source column of the events list might be incorrect.
- Events are not created for IBM Storage Scale events with entity_name fields that do not conform to the URL path components (^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?$).
- Call Home ticket creation can have a failed state when the system is not entitled or the Call Home server does not respond with the ticket number. Click Verify connection to check the test connection.
- To prevent a deadlock condition, the event manager must be restarted every 24 hours. Run the
following command to restart the event manager:
oc rollout restart deployment eventmanager
- Sometimes, the automatic upload of logs might not happen and the Call Home would fail. In such cases, manually upload the logs.
- If you power off a control node wherein the trap server pod is running, then the migration of trap server pods to a different node fails. As a result, the trap server pod may get stuck in the terminating state, and some SNMP trap events may fail to capture and display.
- In rare scenarios, the Events page comes up as empty because of a backend error due to a load with a 504 Gateway error. Generally, this page comes up after sometime automatically as the system recovers, so please try again after sometime.
- One possible reason is log collector pod runs out of space and no space is available on it. This
can happen when too many logs are collected in short period. The solution is to delete already
collected logs from Fusion UI which are no longer required.Workaround
- From the title bar, click the help icon and select Support logs.
- Identify and delete logs through ellipsis menu in the Support logs page.
- If Data Foundation log collection request or requests
for multiple log packages collected simultaneously fails or stuck for more than 6 hours, then update
the pod memory to 12000MiB instead of 6000MiB in the log collector deployment at
spec.containers[0].resources.limits.memory
and then attempt the log collection again one by one. - Sometimes, an intermittent partial log collection error occurs for node log packages in case of
multi-rack set up.Workaround
- Retry to collect node logs again. For steps, see Collecting log packages for IBM Fusion HCI System.
- If the issue persists, then contact IBM team to get the commands (FFDC) to collect the individual node logs.