Serviceability known issues
List of all troubleshooting and known issues in Events, Log collection, and Call Home.
Events that go through the trap server do not get created in IBM Storage Fusion
- Problem statement
- Sometimes, events that go through the trap server do not get created in IBM Storage Fusion.
- Resolution
- If you see the following details in the trapserver logs, follow the steps
as a workaround:
Listening for traps on 0.0.0.0:31620 [fd8c:215d:178e:c0de:a94:efff:fef3:35cd]:41493 byte array parsed in is not a sequence [fd8c:215d:178e:c0de:a94:efff:fef3:3561]:35748 failed to parse sequence length0 2021/10/13 14:20:48.124 [D] Recovered in resetServer, r=runtime error: slice bounds out of range [:-2592903709665718705] Listening for traps on 0.0.0.0:31620 [fd8c:215d:178e:c0de:a94:efff:fef3:3555]:58314 failed to parse sequence length0 [fd8c:215d:178e:c0de:a94:efff:fef3:35cd]:34801 byte array parsed in is not a sequence [fd8c:215d:178e:c0de:a94:efff:fef3:35cd]:52070 byte array parsed in is not a sequence [fd8c:215d:178e:c0de:a94:efff:fef3:3555]:47601 parse error [fd8c:215d:178e:c0de:a94:efff:fef3:3585]:56764 length parse error @ idx 2 [fd8c:215d:178e:c0de:a94:efff:fef3:3399]:44497 failed to parse sequence length0 [fd8c:215d:178e:c0de:a94:efff:fef3:35cd]:57884 failed to parse sequence length0 [fd8c:215d:178e:c0de:a94:efff:fef3:3585]:59094 length parse error @ idx 2 [fd8c:215d:178e:c0de:a94:efff:fef3:3555]:40214 parse error [fd8c:215d:178e:c0de:a94:efff:fef3:3399]:37409 byte array parsed in is not a sequence
- Restart the trapserver pod in the
ibm-spectrum-fusion-ns
. - Run the following command to delete all the
ComputeMonitoring
CRs that are present in theibm-spectrum-fusion-ns
namespace.
Wait for the custom resource instances to get recreated.oc delete cmo --all -n ibm-spectrum-fusion-ns
- Restart BMC of all the compute nodes by executing
resetsp
command from the BMC command line.
- Restart the trapserver pod in the
Log status shows complete for downloaded log file with 0-bytes size
- Problem statement
- The Log status shows complete for downloaded log file with of 0-bytes size. The status should be failed if logs are not collected, but it shows completed with 0-bytes size.
- Cause
-
- Pods get evicted because of a lack of storage space.
- The ongoing jobs are taking time to get storage space, but if it takes more time, then they are marked as stale and automatically get cleaned up.
- Resolution
- There are two methods available to resolve this issue:
Log collection is not working
- Problem statement
- If you are not able to see the collected logs in the user interface or log collection jobs are
intermittently not visible on the user interface, then it might be an issue of incorrect
seLinuxOptions
set in the pods.
- Resolution
- Follow the steps to resolve this issue:
- Run the following command to update the log collector
seLinuxOptions
from the namespace of the log collector deployment along withfsGroupChangePolicy
.oc get namespace ibm-spectrum-fusion-ns -o jsonpath='{.metadata.annotations.openshift\.io/sa\.scc\.mcs}' | \ xargs -I {} oc patch deployment logcollector -n ibm-spectrum-fusion-ns --type='merge' -p "{\"spec\": {\"template\": {\"spec\": {\"securityContext\": {\"seLinuxOptions\": {\"level\": \"{}\"}, \"fsGroupChangePolicy\": \"OnRootMismatch\"}}}}}"
- Verify whether the log collector deployment is updated with the
seLinuxOptions
that are specified in the IBM Storage Fusion namespace annotationopenshift.io/sa.scc.mcs
. - The pods get restarted and the log collector works as expected.
- Run the following command to update the log collector
Known issues
- In the Logs page of the IBM Storage Fusion user interface, if you select System Health Check option during log collection, then it takes longer than usual time to complete. It is observed that the log collection process might take 20 to 25 minutes. In some cases, it can be due to many directories and multiple IMM log file collection process.
- For IBM Storage Scale warning events, the fixed status might be incorrect.
- Sometimes, in the Events page, the Source column of the events list might be incorrect.
- Events are not created for IBM Storage Scale events with entity_name fields that do not conform to the URL path components (^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?$).
- Call Home ticket creation can have a failed state when the system is not entitled or the Call Home server does not respond with the ticket number. Click Verify connection to check the test connection.
- To prevent a deadlock condition, the event manager must be restarted every 24 hours. Run the
following command to restart the event manager:
oc rollout restart deployment eventmanager
- Sometimes, the automatic upload of logs might not happen and the Call Home would fail. In such cases, manually upload the logs.
- If you power off a control node wherein the trap server pod is running, then the migration of trap server pods to a different node fails. As a result, the trap server pod may get stuck in the terminating state, and some SNMP trap events may fail to capture and display.
- In rare scenarios, the Events page comes up as empty because of a backend error due to a load with a 504 Gateway error. Generally, this page comes up after sometime automatically as the system recovers, so please try again after sometime.
- One possible reason is log collector pod runs out of space and no space is available on it. This
can happen when too many logs are collected in short period. The solution is to delete already
collected logs from Fusion UI which are no longer required.Workaround
- From the title bar, click the help icon and select Support logs.
- Identify and delete logs through ellipsis menu in the Support logs page.
- If Data Foundation log collection request or requests
for multiple log packages collected simultaneously fails or stuck for more than 6 hours, then update
the pod memory to 12000MiB instead of 6000MiB in the log collector deployment at
spec.containers[0].resources.limits.memory
and then attempt the log collection again one by one.