Topic
  • 2 replies
  • Latest Post - ‏2012-11-05T14:27:34Z by SystemAdmin
SystemAdmin
SystemAdmin
6902 Posts

Pinned topic AIX kernel crash due to user space process

‏2012-11-03T09:48:07Z |
Hello,

AIX server 6.1 crashed with below stack trace:

=====================================================================
-bash-3.2# kdb vmcore.22
vmcore.22 mapped from @ 700000000000000 to @ 700000058c3836a
START END <name>
0000000000001000 0000000004090000 start+000FD8
F00000002FF47600 F00000002FFDF9C0 __ublock+000000
000000002FF22FF4 000000002FF22FF8 environ+000000
000000002FF22FF8 000000002FF22FFC errno+000000
F1000F0A00000000 F1000F0A10000000 pvproc+000000
F1000F0A10000000 F1000F0A18000000 pvthread+000000
Dump analysis on CHRP_SMP_PCI POWER_PC POWER_5 machine with 2 available CPU(s) (64-bit registers)
Processing symbol table...
.......................done
read vscsi_scsi_ptrs OK, ptr = 0x0
(1)> stat
SYSTEM_CONFIGURATION:
CHRP_SMP_PCI POWER_PC POWER_5 machine with 2 available CPU(s) (64-bit registers)

SYSTEM STATUS:
sysname... AIX
nodename.. aix112
release... 1
version... 6
build date Sep 29 2011
build time 17:43:32
label..... 1139A_61Q
machine... 00CD159C4C00
nid....... CD159C4C
time of crash: Thu Nov 1 23:28:45 2012
age of system: 1 day, 11 hr., 19 min., 13 sec.
xmalloc debug: enabled
FRRs active... 0
FRRs started.. 0

CRASH INFORMATION:
CPU 1 CSA F000000030AC3600 at time of crash, error code for LEDs: 70000000
pvthread+042E00 STACK:
0001BF20abend_trap+000000 ()
000DEC60thread_terminate+000860 ()
000DE038thread_terminate_unlock+000018 (??)
00003850ovlya_addr_sc_flih_main+000130 ()
kdb_get_virtual_memory no real storage @ 11196E6E0
900000000687D1C0900000000687D1C ()
kdb_read_mem no real storage @ FFFFFFFFFFF6680

(1)> status
CPU TID TSLOT PID PSLOT PROC_NAME
0 20005 2 20004 2 wait
1 42E00AF 1070 1170094 279 s2

=====================================================================

If we examine thread 1070 if s2 process, we get below stack:

(1)> sw 1070
Switch to initial thread: <pvthread+042E00>

(1)> f
pvthread+042E00 STACK:
0001BF20abend_trap+000000 ()
000DEC60thread_terminate+000860 ()
000DE038thread_terminate_unlock+000018 (??)
00003850ovlya_addr_sc_flih_main+000130 ()
kdb_get_virtual_memory no real storage @ 11196E6E0
900000000687D1C0900000000687D1C ()
kdb_read_mem no real storage @ FFFFFFFFFFF9860

(1)>

We are not able to debug further as how a user space process "s2"
triggers a kernel crash. Note that s2 process has around 110 threads.
Checking on internet gave below link, with similar stack trace:
http://www-01.ibm.com/support/docview.wss?uid=isg1IZ89428
But it does not mention what is root cause for issue and the fix done.

Any suggestions to move forward ?

Thanks and Regards,
Chintea
Updated on 2012-11-05T14:27:34Z at 2012-11-05T14:27:34Z by SystemAdmin
  • dukessd
    dukessd
    345 Posts

    Re: AIX kernel crash due to user space process

    ‏2012-11-05T00:29:22Z  
    Um, have you got that APAR instaled?

    IZ89428 is an APAR number, it may be a different number on your system depending on your AIX version and release.

    The fix is in the fileset near the bottom of the page: devices.vtdev.scsi.rte.

    Instfix can help you find the installed APARs and lslpp can help you find the installed filesets and levels.

    HTH.
  • SystemAdmin
    SystemAdmin
    6902 Posts

    Re: AIX kernel crash due to user space process

    ‏2012-11-05T14:27:34Z  
    • dukessd
    • ‏2012-11-05T00:29:22Z
    Um, have you got that APAR instaled?

    IZ89428 is an APAR number, it may be a different number on your system depending on your AIX version and release.

    The fix is in the fileset near the bottom of the page: devices.vtdev.scsi.rte.

    Instfix can help you find the installed APARs and lslpp can help you find the installed filesets and levels.

    HTH.
    Hello dukessd,

    Thank you for the suggestion. From the APAR IZ89428, it mentions that fix is in
    "vio_daemon" code, which is not at all installed in our aix server.
    Also from the crash dump, the process "s2" is causing the crash.

    We are suspecting issue should be fixed from "s2" process.

    Thanks and Regards,
    chintea.