IBM Spectrum Scale filesystem outage with a kernel panic
A kernel panic has occurred on a member host that is due to a IBM Spectrum Scale trigger. The trigger repeats on a sporadic but recurring basis.
Symptoms
ID TYPE STATE HOME_HOST CURRENT_HOST ALERT PARTITION_NUMBER LOGICAL_PORT NETNAME
-- ---- ----- --------- ------------ ----- ---------------- ------------ -------
0 MEMBER WAITING_FOR_FAILBACK hostA hostB NO 0 1 hostB-ib0
1 MEMBER STARTED hostB hostB NO 0 0 hostB-ib0
2 MEMBER STARTED hostC hostC NO 0 0 hostC-ib0
128 CF PRIMARY hostD hostD NO - 0 hostD-ib0
129 CF PEER hostE hostE NO - 0 hostE-ib0
HOSTNAME STATE INSTANCE_STOPPED ALERT
-------- ----- ---------------- -----
hostA INACTIVE NO YES
hostB ACTIVE NO NO
hostC ACTIVE NO NO
hostD ACTIVE NO NO
hostE ACTIVE NO NO
In the previous example, hostA has a state of INACTIVE,
and an ALERT field is marked as YES. This output of the db2instance -list command
is seen when hostA is offline or rebooting. Since the home host for member 0,
hostA is offline, member 0
has failed over to hostB. Member 0
is now waiting to failback to its home host, as indicated by the WAITING_FOR_FAILBACK
state. After hostA is rebooted from the panic, member 1
will fail back to hostA.Diagnosis
2009-08-27-23.37.52.416270-240 I6733A457 LEVEL: Event
PID : 1093874 TID : 1 KTID : 2461779
PROC : db2star2
INSTANCE: NODE : 000
HOSTNAME: hostB
EDUID : 1
FUNCTION: Db2, base sys utilities, DB2StartMain, probe:3368
MESSAGE : Idle process taken over by member
DATA #1 : Database Partition Number, PD_TYPE_NODE, 2 bytes
996
DATA #2 : Database Partition Number, PD_TYPE_NODE, 2 bytes
0
LABEL: KERNEL_PANIC
IDENTIFIER: 225E3B63
Date/Time: Mon May 26 08:02:03 EDT 2008
Sequence Number: 976
Machine Id: 0006DA8AD700
Node Id: hostA
Class: S
Type: TEMP
Resource Name: PANIC
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
ASSERT STRING
5.1: xmemout succeeded rc=d
PANIC STRING
kx.C:2024:0:0:04A53FA8::advObjP == ofP->advLkObjP
If you see a KERNEL_PANIC log entry as shown in the previous example, the system reboot might be due to an operating system kernel panic that was triggered by a problem in the IBM Spectrum Scale subsystem. A kernel panic and system reboot can be the result of excessive processor usage or heavy paging on the system when the IBM Spectrum Scale daemons do not receive enough system resources to perform critical tasks. If you experience IBM Spectrum Scale filesystem outages that are related to kernel panics, the underlying processor usage or paging issues must be resolved first. If you cannot resolve the underlying issues, run the db2support command for the database with the -s parameter to collect diagnostic information and contact IBM Technical Support.