Initial steps on kdb
For basic dump analysis, this article is mostly interested in a dump image. Here, we cover how to extract the appropriate files from the snap package and then explain a methodical approach to examine the dump, and find the fundamental reason for a system crash. The dump file and the UNIX® file are in the dump subdirectory of the snap package.
Though we are primarily focused on the dump image, it is important to note that snap can provide you with useful information when used with appropriate options. Additional information is found in the General and Kernel subsections of the article.
This general directory includes information about the system runtime environment, for example:
- Copy of ODM data.
- All environment variables (e.g., PATH and TZ).
- Date and time the data was collected.
- Amount of real memory on the system (bootinfo -r).
- Listing of all defined paging spaces.
- Listing of all installed filesets and their levels.
- Listing of all installed APARs.
- Device attributes (lsattr -El).
- System VPD information (lscfg -pv).
- Status of last dump (sysdumpdev -L).
The kernel subdirectory contains useful kernel information (Process and memory data).
- Date and time the data was collected
- Vmstat output
- VMM tunable information (vmo -L).
- Scheduling tunable information (schedo -L).
- I/O related tunable iformation (ioo -L).
- Environment variables.
- SRC information (lssrc -a).
- Process information (ps -ef and ps -leaf).
- Checksum of device drivers and methods.
Extracting the snap package
The pax command is used to extract files from the snap package.
- To view the contents of a snap package, type:
# zcat snap.pax.Z | pax -v
- To extract the entire contents of a package, type:
# zcat snap.pax.Z | pax -r
- To extract just the dump, general, and kernel subdirectories, type:
#uncompress snap.pax.Z #zcat snap.pax.Z | pax -r ./dump ./general ./kernel
What is kdb?
The kdb uitility examines the operating system image of the current running system and is very tightly coupled with the IBM® AIX® kernel. This is because it requires knowledge of the structures used by the kernel to correctly format the information contained in the system dump image. The kdb command has many subcommands for viewing and formatting data structures.
KDB - The Kernel Debugger
kdb is an interactive kernel debugger. kdb allows the user to control execution of kernel code (including kernel extensions and device drivers), and to observe and modify the variables and register. It has to be invoked by a special boot image.
The kdb is a tool/command for analysing the system dumps. It is used for post-mortem analysis of system dumps, or for monitoring the running kernel.
kdb is invoked with two arguments when examining a system dump. The first specifies the dump image, and the second specifies the UNIX file of the kernel that was running on the system at the time of the dump. The UNIX file must match the dump image (i.e., be the one that was running at the time of crash). If it does not, kdb displays an error message and exits.
# kdb [dump] [unix]
When invoked with no arguments, the kdb examines the image of the currently running system.
Getting dump status
Upon invoking kdb on a dump image, you first retrieve the basic dump status using the stat subcommand:
Figure 1. Stat output
The stat subcommand of kdb provides information about the dump we are looking at. Along with date, time, version, and release information, we also find the dump reason code.
Dump reason code
If a dump was initiated because the system detected a problem, then the output of the stat subcommand contains a section titled CRASH INFORMATION.
- The first line of output in this section lists the CPU that detected the problem that caused the dump routine to be invoked (CPU 0 in the previous example).
- The dump reason code is the first three digits of the number displayed next to the
text:error code for LEDs. In the previous example the first line of the CRASH INFORMATION section contains the following text
error code for LEDs: 30000000
This means that the Dump Reason Code is 300, meaning DSI (DATA STORAGE INTERRUPT).
The Dump Reason Code indicates the fundamental reason for the crash. Most often crashes show reason codes 300, 400, and 700.
|3nn||Data Storage Interrupt|
|400||Instruction Storage Interrupt|
|800||Floating Data Unavailable|
- 2nn-Machine check.
A machine check reason code usually indicates a hardware problem (for example, bad memory).
- 3nn-Data Storage Interrupt (DSI).
A DSI occurs whenever a reference is made to a virtual address that is not currently loaded in physical memory (this is a page fault). Page faults occur all the time. They are normally handled by VMM and do not normally result in a system crash. However, if a page fault cannot be resolved while in kernel mode or if a page fault occurs when interrupts are disabled, then a DSI will cause a system crash. The root cause is usually software, but may occasionally be hardware related.
- 400-Instruction Storage Interrupt(ISI).
An instruction storage interrupt is a page fault on an instruction fetch. This would normally be resolved by the VMM and would not normally result in a system crash. If it cannot be resolved by the VMM, or if the page fault occurs when interrupts are disabled, the system will crash.
- 5nn-External Interrupt.
Crash is caused by an interrupt from a external device, such as an I/O bus controller.
- 700-Program Interrupt.
This type of crash results when a kernel routine invokes a trap instruction. This is normally from a call to the panic kernel service or from a failed assert. A kernel routine will call panic when it has encountered a problem that it cannot resolve. The ultimate cause is frequently software, but can also be hardware related.
- 800-Floating point unavailable.
An attempt was made to execute a floating point instruction when the floating point available bit in the MSR (machine status register) was disabled.
Initial CPU context
kdb always starts in the context of the CPU running the crashing thread.
When we invoke kdb, the prompt is shown as:
(n)>>, where n is a number that indicates the current CPU
However, to change the context to another CPU, use the cpu subcommand:
(0)>> cpu 1 (1)>>
MST (MACHINE STATE SAVE AREA)
The Machine State save area, or MST, contains a saved image of the CPU's process context. The process context includes the general purpose registers, floating point registers, the special purpose registers, and other information necessary to restart a thread when it is dispatched. Each processor on the CPU has its own CSA (current save area) pointer that points to the MST that is to be used when a thread or interrupt handler is interrupted or swapped due to context switch. While a thread is active, the CSA for the processor running that thread will point to the MST of the current active thread.
- One of the fields in the MST is a pointer to the previous MST (prev in kdb). This field will only be filled in in interrupt context.
Figure 2. Chain of MST structures represents an interrupt history
- The mst subcommand:
mst [ thread_slot | thread_table_address ]
provides a formatted display of the MST for the specified thread. If no address is specified, it displays the MST area for the current processor.
Figure 3. Output of the mst subcommand on 64-bit kernel
Figure 4. Output of the mst subcommand on 32-bit kernel
Data fields in mst subcommand output
- iar-The instruction address register.
Contains the location of the faulting instruction (for the faulting MST). It is the current instruction that has been executed during the time of crash.
- lr-Value of the link register.
Contains the return address of the function.
- Except-Displays the exception structure.
Gives information about the nature of the crash.
- r0-r31 shows the contents of general purpose registers.
- intpri-Indicates the interrupt priority level.
Indicates if the processor is running with interrupts disabled.
- prev-If the value is zero.
The MST represents the base-level thread that was running on the processor. If the value is non-zero, then the processor was handling an interrupt. The prev field points to the next MST structure in the chain.
The values in the prev and intpri fields help in determining whether the processor is running a thread or an interrupt handler.
When a CPU is running a thread, the thread can be either in process context or interrupt context. Interrupt handlers always run in interrupt context.
Figure 5. Process-Interrupt context
The main difference between the process and interrupt environments is that no page faults are permitted while in the interrupt environment.
The Machine State Register (MSR)
Each CPU has its own MSR, which indicates the status of the processor.
dr msr /* dr cmd is used to dump the contents of any register */
(0)> dr msr msr : 0000F0B2 bit set : EE PR FP ME IR DR RI
Figure 6. MST register
Evaluating MSR data
When examining the system dump, we are only interested in what was running on the CPU that caused the crash.
If the PR bit in MSR is set (Problem State), then the CPU was running in user mode, and is unlikely to be the cause of the crash. This is because user mode instructions do not have sufficient privileges to cause the system to crash.
In the case of dumps where the dump reason code is 300 or 400, we have to evaluate the exception structure.
Figure 7. Exception structure
- The dar (Data Address Register) contains the address that the processor tried to access, which then caused a page fault.
- The dsisr (Data Storage Interrupt Status Register) indicates why the page fault could not be resolved.
Figure 8. DSISR bits description
- The dsirr (Data Storage Interrupt reason Register) indicates the type of page fault that has occurred.
Decoding the dsirr is done by converting the hexadecimal value to decimal.
Values between 1 to 127 are error numbers (shown in /usr/include/sys/errno.h), whereas between 128 to 512 are exceptions (shown in /usr/include/sys/m_except.h).
We have to evaluate the IAR value shown in the mst output. In particular we will examine the instruction that is referenced by the IAR, to confirm that it could cause the type of crash reported by the stat subcommand
VMM error log
Figure 9. Vmlog output
The vmlog subcommand provides additional information when the reason for the crash came from VMM. The Error ID shown in the case of code 300 dump is DSI_PROC, or ISI_PROC in the case of code 400 dump. The Exception DSISR/ISISR,srval,virt addr will contain the same information as that shown in the exception structure. The Exception value is an internal VMM error code.
f [threadtableslot | threadtableaddress]
It displays the kernel mode stack trace of the thread.
When given with no arguments, displays stack trace for the current CPU context.
Figure 10. Trace output
Figure 11. DSI example
The image above shows the output from the stat command. The error code indicated by the output is 300, which means this is a DSI crash.
Figure 12. DSI trace
The stack trace shown above describes that the problem appears to have happened when a thread running on CPU 0 ran the function __memmove.
Figure. 13. mst command output
The output of the mst subcommand shows that the thread was running (indicated by prev =0 ) with all interrupts enabled (indicated by intpri =0B). In other words, the thread was running in the process environment.
The dsisr field of the exception structure has a two-bit set, DSISR_PROT and DSISR_ST.The DSISR_PROT field indicates that there has been a protection violation. The DSISR_ST bit indicates that the problem was with the store operation. The data address that caused the page fault problem was 0x00(dar 0000000000000000).
The DSISR_ST bit in the dsisr field of the exception structure indicates the problem was a store operation. We expect the IAR to reference a store instruction of some form.
0)> dc @iar ___memmove64+000058 std r7,8(r3)
The current contents of r7 will be stored at the memory address calculated by adding 0x08 to the current value of r3. From the output of the mst subcommand, we can see that the current value of r3 is FFFFFFFFFFFFFFF8, so by adding 0x08 to FFFFFFFFFFFFFFF8 we get:
(0)> hcal FFFFFFFFFFFFFFF8+0x08 Value hexa: 00000000 Value decimal: 0
So this means the current contents of r7 will be stored at the address 0X00.This matches with the value shown in the dar field of the exception structure.
The first page of virtual memory in the kernel address space can be accessed by kernel code, but is marked as read-only. Any attempt to write to this memory range will result in a protection violation.
Process and threads
The proc subcommand
When invoked with an asterix (*), the proc subcommand displays one line summary of each active process in the process table.
When invoked with a process table address or slot number, the proc subcommand displays detailed formatted process information for the specified address or slot. (If the proc subcommand is invoked with no arguments, it displays detailed formatted process information for the current process.)
Figure 14. proc command output
Figure 15. pid output
The thread subcommand
When invoked with an asterix (*), the thread subcommand displays a one-line summary for each active thread in the thread table.
When invoked with a thread table address or slot number, the thread subcommand displays detailed formatted thread information for the specified thread. (If the thread sudcommand is invoked with no arguments, it displays detailed formatted thread information for the current thread.)
Figure 16. thread command output
Figure 17. tid output
Current thread and process
The status subcommand lists the current thread and process information for all CPUs.
Figure 18. status command output
Other kdb subcommands
- By default, kdb works in the symbolic mode, and the addresses mostly are shown in
output as symbol + offset.
Figure 19. kdb output
Here pvproc+000000 is in the Symbol+Offset format.
Using ns or set no_symbol, we can change the above format:
Figure 20. ns command output
- Use the nm subcommand to get the address of the symbol and Table Of Contents (TOC)
section of the executable module that contains symbols.
Figure 21. nm command output
- lke subcommand with no arguments lists the currently loaded extensions
- ts subcommand to translate the address to the symbolic representation
- CALCULATORS-Use hcal for hexadecimal, dcal for decimal calculation
- Logging the kdb session-(0)>set logfile filename.
(0)> set loglevel 2
The loglevel can be set to 0,1, or 2.This determines the type of information that will be logged in a log file. The value of 0 disables logging into the log file. The value of 1 logs only the kdb commands typed at the prompt. The value of 2 logs both the input and the output of the kdb session.
Because finding the root cause of system crash issues can be a tedious process, it is helpful to know how kdb can provide a methodological approach to analyze the crash issues. Using kdb, you will be able to track down the system crash problems quickly. In addition, you will gain a valuable skill that will not only save lots of time from the debugging perspective, but it will provide insight as to what coding practices will avoid system crash in the future. Last but not the least, this approach will provide a methodological approach to solve the crash issue very quickly, gaining customer confidence.
- KDB kernel debugger and kdb command
- Download IBM product evaluation versions and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.