IBM Support

Lightweight Memory Trace

Question & Answer


Question

Lightweight Memory Trace

Answer

LMT overview
Disabling LMT
LMT memory consumption
Using LMT

This article describes how to use lightweight memory trace. Lightweight memory trace (LMT), is a key serviceability feature being introduced in AIX 5300-03 (Maintenance Level 3 for AIX 5.3, hereafter referred to by its internal release name, AIX 5.3 ML3). LMT is an important first failure data capture tool, and anyone doing AIX service should know about it. LMT is an efficient, default-on, per CPU, in-memory kernel trace. It is built upon the trace functionality that already existed in AIX kernel subsystems, and is most useful for those who have AIX source access or a deep understanding of AIX internals. Because LMT is intended for use by IBM service personnel, not the end customer, not all of its commands have been documented externally. LMT is primarily documented externally in the AIX 5L Version 5.3 Release Notes, and the documentation focuses on how to turn it off and how to resize its memory buffers. This article provides the additional details needed for service personnel to understand and take full advantage of LMT to rapidly diagnose problems. It also serves as a reference for cases where customers desire more information regarding LMT.


LMT overview

LMT provides trace information for First Failure Data Capture (FFDC). It is a constant kernel trace mechanism that records software events occurring during system life. The system activates LMT at initialization, then tracing runs continuously. Recorded events are saved into per processor memory trace buffers. There are two memory trace buffers for each processor, one to record common events, and one to record rare events. The memory trace buffers can be extracted from system dumps or accessed on a live system by service personnel. The trace records look like traditional AIX system trace records. The extracted memory trace buffers can be viewed with the trcrpt command, with formatting as defined in the /etc/trcfmt file.

LMT has been carefully implemented such that it has negligible performance impacts. The impact on the throughput of a kernel-intensive benchmark is just one percent, and is much less for typical user workloads. LMT requires the consumption of a small amount of pinned kernel memory. The default amount of memory required for the trace buffers is calculated based on factors that influence software trace record retention. Additional details on the LMT memory consumption are provided below.

LMT differs from traditional AIX system trace in several ways. First, LMT is more efficient. Second, LMT is enabled by default, and has been explicitly tuned as an FFDC mechanism. Unlike traditional AIX system trace, you cannot selectively record only certain AIX trace hook ids with LMT. With LMT, you either record all LMT-enabled hooks, or you record none. This means that traditional AIX system trace is the preferred Second Failure Data Capture (SFDC) tool, since you can more precisely specify the exact trace hooks of interest given knowledge gained from the initial failure. Traditional AIX system trace also provides options that allow you to automatically write the trace information to a disk-based file (such as /var/adm/ras/trcfile). LMT provides no such option to automatically write the trace entries to disk when the memory trace buffer fills. When an LMT memory trace buffer fills, it 'wraps', meaning the oldest trace record is overwritten.

The value from LMT comes in being able to view some history of what the system was doing prior to reaching the point where a failure is detected. As previously mentioned, each CPU has a memory trace buffer for 'common' events, and a smaller memory trace buffer for 'rare' events. The intent is for the 'common' buffer to have a 1 to 2 second retention (in other words, have enough space to record events occurring during the last 1 to 2 seconds without wrapping). The 'rare' buffer should have something like an hour's retention. This of course depends on workload, and also on where developers place 'trace hook' calls in the AIX kernel source, and on what parameters they trace. We only want to trace data into LMT that has high serviceability value (for both retention and performance reasons). So AIX 5.3 ML3 is tuned such that overly expensive, frequent, or redundant trace hooks are not recorded via LMT. Note all of the kernel trace hooks are still included in traditional AIX system trace (when it is enabled). So a given trace hook entry may be recorded in LMT, system trace, or both. By default the LMT-aware trace macros in the source code write into the LMT common buffer, so there is currently little 'rare' buffer content in AIX 5.3 ML3. Expect the LMT trace buffer contents to continue to be refined in future AIX releases, as AIX developers will continue to tune their trace hooks for serviceability needs and performance tradeoffs. Kernel extensions will also start tracing into the LMT buffers (LMT is currently only used by the base kernel).

LMT has proven to be a very useful tool during the development of AIX 5.3 ML3. As an FFDC tool, it is expected to mitigate some of the requests for problem recreation that currently vex customers and IBM service alike.


Disabling LMT

LMT is disabled (or can be re-enabled) by changing the mtrc_enabled tunable via the /usr/sbin/raso command. The raso command is documented in the AIX 5L Version 5.3 Commands Reference, Volume 4. To turn off (disable) LMT, type the following:

    raso -r -o mtrc_enabled=0

To turn on (enable) LMT, type the following:

     raso -r -o mtrc_enabled=1

NOTE: In either case, the boot image is rebuilt (bosboot needs to be run), and the change does not take effect until the next reboot.

When LMT is disabled, the trace memory buffers are not allocated, and most of the LMT-related instruction path-length is also avoided.


LMT memory consumption

The default amount of memory required for the memory trace buffers is automatically calculated based on factors that influence software trace record retention, with the target being sufficiently large buffers to meet the retention goals previously described. There are several factors that may reduce the amount of memory automatically used. The behavior differs slightly between the 32-bit (unix_mp) and 64-bit (unix_64) kernels. For the 64-bit kernel, the default calculation is limited such that no more than 1/128th of system memory can be used by LMT, and no more than 256 MB by a single processor. The 32-bit kernel uses the same default memory buffer size calculations, but further restricts the total memory allocated for LMT (all processors combined) to 16 MB. The following table shows some examples of default LMT memory consumption:

|-------------------------------------------------------------------------------
--------------|
|                             |            |           | Total LMT         | 
Total LMT        |
|                             | Number of  | System    | Memory: 64-bit    | 
Memory: 32-bit   |
| Machine                     | CPU's      | Memory    | Kernel            | 
Kernel           |
|-----------------------------|------------|-----------|-------------------|----
--------------|
| POWER3 (375 MHz CPU)        | 1          | 1 GB      | 8 MB              | 8 
MB             |
|-----------------------------|------------|-----------|-------------------|----
--------------|
| POWER3 (375 MHz CPU)        | 2          | 4 GB      | 16 MB             | 16 
MB            |
|-----------------------------|------------|-----------|-------------------|----
--------------|
| POWER5 (1656 MHz CPU,       |            |           |                   
|                  |
| SPLPAR, 60% ent cap, SMT)   | 8 logical  | 16 GB     | 120 MB            | 16 
MB            |
|-----------------------------|------------|-----------|-------------------|----
--------------|
| POWER5 (1656 MHz CPU)       | 16         | 64 GB     | 512 MB            | 16 
MB            |
|-------------------------------------------------------------------------------
--------------|

To determine the total amount of memory (in bytes) being used by LMT, enter the following shell command:

     echo mtrc | kdb | grep mt_total_memory

The 64-bit kernel resizes the LMT trace buffers in response to dynamic reconfiguration events (for both POWER4 and POWER5 systems). The 32-bit kernel does not, it will continue to use the buffer sizes calculated during system initialization. Note for either kernel, in the rare case that there is insufficient pinned memory to allocate an LMT buffer when a CPU is being added, the CPU allocation will fail. This can be identified by a CPU_ALLOC_ABORTED entry in the AIX error log, with detailed data showing an Abort Cause of 0000 0008 (LMT) and Abort Data of 0000 0000 0000 000C (ENOMEM).

For the 64-bit kernel, the /usr/sbin/raso command can also be used to increase or decrease the memory trace buffer sizes. This is done by changing the mtrc_commonbufsize and mtrc_rarebufsize tunable variables. These two variables are dynamic parameters, which means they can be changed without requiring a reboot. For example, to change the per cpu rare buffer size to sixteen 4K pages, for this boot as well as future boots, you would enter:

     raso -p -o mtrc_rarebufsize=16

For more information on the memory trace buffer size tunables, see the raso command documentation.

Internally, LMT tracing is temporarily suspended during any 64-bit kernel buffer resize operation.

For the 32-bit kernel, the options are limited to accepting the default (automatically calculated) buffer sizes, or disabling LMT (to completely avoid buffer allocation).


Using LMT

This section will describe various commands available to make use of the information captured by LMT. Recall that LMT usage is designed to be by IBM service personnel, so these commands (or their new LMT-related parameters) may not be documented in the external documentation in InfoCenter. To help you recall specific syntax, each command can display a usage string if you enter command -?. The remainder of this paper provides an overview of the commands you use to work with LMT.

The LMT memory trace buffers are included in an AIX system dump. You manipulate them similarly to how traditional AIX system trace buffers are used. The easiest method is to use the trcdead command to extract the LMT buffers from the dump. The new -M parameter on the trcdead command extracts the buffers into files in the LMT log directory. By default this is /var/adm/ras/mtrcdir. For example, to extract LMT buffers from a dump image called dumpfile, you would enter:

     trcdead -M dumpfile

Each buffer is extracted into a unique file, with a control file for each buffer type. It is similar to the per CPU trace option of traditional AIX system trace. As an example, executing the previous command on a dump of a 2 processor system would result in the creation of the following files:
     ls /var/adm/ras/mtrcdir
     mtrccommon      mtrccommon-1   mtrcrare-0
     mtrccommon-0    mtrcrare       mtrcrare-1

The new -M parameter of the trcrpt command can then be used to format the contents of the extracted files. Presently trcrpt allows you to look at the common files together, or the rare files together, but will not display a totally merged view of both sets. All LMT trace record entries are time-stamped, so it is straight forward to merge files when desired, and at present this is left as an exercise for the user. Also remember that in the initial version of AIX 5.3 ML3, rare buffer entries are truly rare, and most often the interesting data will be in the common buffers. Continuing the previous example, to view the LMT files that were extracted from the dumpfile, you could enter:

     trcrpt -M common
and
     trcrpt -M rare

Other trcrpt parameters can be used in conjunction with the -M flag to qualify the displayed contents. As one example, you could use the following command to display only VMM trace event group hookids that occurred on CPU 1:

     trcrpt -D vmm -C 1 -M common
trcrpt is a powerful command, and you are encouraged to consult the external documentation for additional information regarding its general usage.

trcrpt is the easiest and most flexible way to view LMT trace records. However, it is also possible to use the kdb dump reader and KDB debugger to view LMT trace records. This is done via the new mtrace subcommand. Without any parameters, the subcommand displays some global information relating to LMT. The -c parameter is used to show LMT information for a given CPU, and can be combined with the 'common' or 'rare' keyword to display the common or rare buffer contents for a given CPU. The -d flag is the other flag supported by the mtrace subcommand. This option takes additional subparameters that define a memory region via its address and length. The -d option formats this memory region as a sequence of LMT trace record entries. One potential use of this option is to view the LMT entries described in the dmp_minimal cdt of a system dump.

NOTE: Any LMT buffer displayed from kdb/KDB contains only generic formatting, unlike the output provided by trcrpt. The kdb/KDB subcommand is a more primitive debug aid. It is documented in the external KDB documentation for those wishing additional details. As a final comment regarding kdb and LMT, the mtrace subcommand is not fully supported when the kdb command is used to examine a running system. In particular, buffer contents will not be displayed when the kdb command is used in this live kernel mode.

The final option for accessing LMT trace records is to extract them on a running system. The new mtrcsave command is used to extract the trace memory buffers into disk files in the LMT log directory. Recording of new LMT entries is temporarily suspended while the trace buffers are being extracted. The extracted files look identical to the files created when LMT buffers are extracted from a system dump by trcdead. And as with LMT files created by trcdead, the trcrpt command is used to view them.

Without any parameters, the mtrcsave command will extract both common and rare buffers, for every CPU, to the LMT log directory. The -M flag can be used to specify a specific buffer type, common or rare. And the -C flag can be used to specify a specific CPU or a list of CPU's. CPU's in a CPU list are separated by commas, or the list can be enclosed in double quotation marks and separated by commas or blanks. The following example shows the syntax for extracting the common buffer only, for only the first 2 CPU's of a system. CPU numbering starts with zero. By default, the extracted files are placed in /var/adm/ras/mtrcdir:

     mtrcsave -M common -C 0,1
     ls /var/adm/ras/mtrcdir
     mtrccommon    mtrccommon-0   mtrccommon-1

The snap command can be used to collect any LMT trace files created by mtrcsave. This is done via the gettrc snapscript, which supports collecting LMT trace files from either the default LMT log directory or from an explicitly named directory. The files are stored into the /tmp/ibmsupt/gettrc/<logdirname> subdirectory. Using snap to collect LMT trace files is only necessary when someone has explicitly created LMT trace files and wants to send them to service. If the machine has crashed, the LMT trace information is still embedded in the dump image, and all that is needed is for snap to collect the dump file. You can see the options supported by the gettrc snapscript by executing:

     /usr/lib/ras/snapscripts/gettrc -h

As an example, to collect general system information, as well as any LMT trace files in the default LMT log directory, you would enter:

     snap -g "gettrc -m"

The preceding discussions of the trcdead, trcrpt, mtrcsave, and snap commands mention the LMT log directory. The trcdead and mtrcsave commands create files in the LMT log directory. The trcrpt command looks in the LMT log directory for LMT trace files to format, and the gettrc snap script may look in the LMT log directory for LMT trace files to collect. By default, the LMT log directory is /var/adm/ras/mtrcdir. This can be changed to a different directory via the trcctl command. For example, to set the LMT log directory to a directory associated with a dump being analyzed, you might enter:

     trcctl -M /mypath_to_dump/my_lmt_logdir

This sets the system-wide default LMT log directory to /mypath_to_dump/my_lmt_logdir, and subsequent invocations of trcdead, trcrpt, mtrcsave, and the gettrc snapscript will access the my_lmt_logdir directory. This single system-wide log directory is actually an inconvenience on multi-user machines where simultaneous analysis of different dumps is occurring.

Offering more flexible log directory support is an example of the kind of usability enhancements you can expect to see in the future with LMT. AIX development will also continue to refine the LMT trace record contents and their formatting. But the LMT support introduced with AIX 5.3 ML3 represents a significant advance in AIX first failure data capture capabilities, and provides service personnel with a powerful and valuable tool in diagnosing problems.

[{"Product":{"code":"SWG10","label":"AIX"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Support information","Platform":[{"code":"PF002","label":"AIX"}],"Version":"5.3","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Historical Number

isg1pTechnote1488

Document Information

Modified date:
17 June 2018

UID

isg3T1000677