IBM AIX RAS – firmware-assisted and live dump facilities explained

The article explains advanced IBM® AIX® dump facilities built up to capture system dump robustly. More specifically, it discusses the various aspects of firmware-assisted and live dump facilities. It provides the details of live dump kernel service to reliability, availability, and serviceability (RAS) infrastructure and various options or turntables to be considered to capture the live dump for a component or driver data. It illustrates the live dump facilities with an example kernel extension.

Premkumar Nagarajan (premkuna@in.ibm.com), staff system engineer, IBM

Premkumar Nagarajan is a staff system engineer working for IBM AIX RAS development team. He is working on various features to enable with reliability, availability, and serviceability (RAS) capabilities for AIX. Before joining IBM, he worked on various storage and network technologies and various core kernel component development activities.



01 February 2013

Also available in Chinese

Overview of firmware-assisted dump

Firmware-assisted dump offers improved reliability over the traditional dump type, by rebooting the partition and using a new kernel to dump data from the previous kernel crash.

Firmware-assisted dump requires:

  • An IBM POWER6 processor-based or later hardware platform.
  • A logical partition (LPAR) with a minimum of 1.5 GB memory.
  • A dump logical volume in the root volume group (rootvg).
  • Paging space, which cannot be defined as the dump logical volume

When a partition configured for firmware-assisted dump is started, a portion of memory known as the scratch area is allocated to be used by the firmware-assisted dump functionality. For this reason, a partition that is configured to use the traditional system dump requires a restart to allocate the scratch area memory that is required for a firmware-assisted dump to be initiated. The firm-ware helps in preserving the pages to dump until a non-faulting OS comes up. The non-faulting OS will complete the processing of copying the preserved memory to dump file.

Error codes for firmware-assisted dump

Boot loader will start writing the data in dump logical blocks, and space will be freed as soon as the data in the dump logical blocks is written to the dump logical volume. If a certain percentage of the main memory is freed, then AIX will be allowed to boot. From then on, AIX will take over the control to write the rest of the data in dump logical blocks in to the dump logical volume.

During this process, the boot loader and AIX will notify the progress of firmware-assisted dump to the console.

Following is the light-emitting diode (LED) code used for firmware-assisted dump:


0c0 – Indicates that the firmware-assisted dump is successful

sysdumpdev -l – Is used to check the actual error code from the OS.

Live dump

A live dump capability is provided to allow failure data to be dumped without taking down the entire system. The two most frequent uses of live dumps might include the following scenarios:

  • From the command line, a system administrator issues the livedumpstart command to dump data related to the failure.
  • From recovery, a subsystem needs to dump out data pertaining to the failure before re-covering. This ability is only available to the kernel and kernel extensions.

Serialized dump and unserialized dump

A serialized live dump refers to a dump that causes the system to be frozen or suspended, while data is being dumped. While the system is frozen, the data is copied into kernel-pinned memory. It is written to the file system only after the system is unfrozen. Unserialized dump refers to take the dump without freezing the system.

The system is frozen by stopping every processor except for the dumping processor. The dump data is then captured with the dumping processor at INTMAX, the most-favored interrupt priority. It should be noted that, while the system is frozen, page faults are not allowed.

Synchronous and asynchronous live dump

In synchronous live dump, the caller waits for the data collection for this dump to complete whereas, in asynchronous live dump, the caller schedules the dump to be taken, but does not wait for completion.

Dump location

The data captured during a live dump pass is queued to the live dump process. When the system is unfrozen, this process then writes the data to the file system. By default, live dumps are placed in the /var/adm/ras/livedump directory. The dump file name has the form: [prefix.]component.yyyymmddhhmm.xx.DZ.

Live dump heap memory

The pinned kernel memory used for live dumps is in a separate live dump heap. By default, this heap is at most 64 MB. The heap may not be larger than 1/16 of the size of real memory.

Live dump pass

A serialized live dump may occur in one pass or multiple passes. A dump pass consists of the data that could be buffered in pinned storage while the system was frozen. A dump taken in mul-tiple passes involves multiple system freezes, and thus, the data in a multipass dump may not be consistent. A live dump can be initiated from software by the kernel or a kernel extension. Any component to be included in the dump must have previously registered with the kernel, using ras_register(), as dump aware. They must also have indicated that they handle live dumps by using the RASCD_SET_LDMP_ON ras_control() service.

Component memory level and maximum buffer size

The following list shows the data limits for a component. If the component exceeds these limits, its data is truncated by only dumping its data entries prior to the one that caused the limit to exceed.

The following list specifies the maximum data allowed for each live dump detail level

  • < CD_LVL_NORMAL - 2 MB
  • >= CD_LVL_NORMAL and < CD_LVL_DETAIL - 4 MB
  • >= CD_LVL_DETAIL and < CD_LEVEL_9 - 8 MB
  • CD_LEVEL_9 - unlimited(real memory/16)

To perform a live dump from software:

  • Use ldmp_setupparms() to initialize an ldmp_parms_t item.

This sets up the data structure, filling in all default values including the eye catcher and version fields.

  • Specify components, using dmp_compspec(), and pseudo-components.

This is how the content of the dump is specified.

  • Create the dump using the livedump() kernel service.

This takes the dump.

This is shown at the end of the example.

Pseudo component

A dump pseudo-component refers to a service routine used to dump data that is not associated with a component. Such pseudo components (such as kernel context, thread, process, and so on) are provided strictly for use within a dump.

Staging buffer

A component might request for space in a staging buffer for use during a system or live dump. For the system dump, a component may allocate a private (RASCD_SET_SDMP_STAGING) or a shared staging buffer (RASCD_SET_SDMP_SHARED_STAGING). A private staging buffer is necessary if the buffer is to be used for actual data to be dumped (for example, a device's mi-crocode or log). A shared staging buffer might be used if the area is only used for dump metadata such as the component's dump table.

Live dump sequence through callback

A component participating in a live dump must have a callback routine to handle the following ras_control() commands. Upon receipt of the callback command, the callback issues the "_SET" command to perform the action. Refer to the example extension, paying particular attention to the sample_callback() function.

RASCD_LDMP_PREPARE (used to prepare to take a live dump)

The callback receives this call when it has been asked to participate in a live dump. The callback may use dmp_compspec() to specify other components to include in the dump if necessary. It may also specify pseudo components such as dmp_eaddr().It must return an estimate of the amount of data to be dumped. This should be a maximum amount. It should include the space taken up by the dump table. It should not include the memory dumped by other components or pseudo components. If, for example, the prepare function uses dmp_ct() to dump component trace data, the dmp_ct() pseudo component will provide that estimate.

RASCD_LDMP_START (used to dump data)

This is the command received by a callback when it is to provide its data for the dump. The callback puts its dump table address in the ldmpst_table field of the ldmp_start_t data item received as the argument. The callback receives subsequent RASCD_LDMP_AGAIN calls to provide more data. This stops when the callback returns a NULL dump table pointer.

RASCD_LDMP_FINISHED-This is the command indicating that the dump is finished. Also, no data is dumped for that component.

RASCD_LDMP_AGAIN-The RASCD_LDMP_AGAIN command provides more data. The return code is treated the same as for RASCD_LDMP_START, except that if a value less than zero is returned, no further data is dumped for the component, but data already dumped by previous RASCD_LDMP_START and RASCD_LDMP_AGAIN calls will appear in the dump.

RASCD_LDMP_FINISHED-This command indicates that the live dump is complete.

RASCD_DMP_PASS_THROUGH - This command just passes arbitrary text data to the callback.

Note that RASCD_DMP_PASS_THROUGH applies to the entire dump domain, (that is) there is only one pass through for the domain containing live and system dump. You can pass data to a component’s RASCD_DMP_PASS_THROUGH handler by using dumpctrl.

For example, the command,

dumpctrl -l foo "pass through text" passes "pass through text" to the RASCD_DMP_PASS_THROUGH handler for the component with alias of foo.

RASCD_LDMP_ESTIMATE -This command provides an estimate of how much data would be dumped.

There are some constraints placed on live dumps:

  • A component is limited in what it can dump by the detail level.
  • As the live dump can happen while the system is frozen, only a limited set of system services may be used by the component callbacks during the dump, for example, lightweight memory trace and component trace.

A component may specify any data to be dumped, however, in a serialized dump, only memory resident data is dumped.

Consideration for live dump data requirements

Multiple passes

It is provided to a component that is required to dump more data and that can not be dumped in a single freeze. It can be dumped in multiple passes through staging buffer, but data might be changed in unfreeze and next freeze time. Single passes allowed for a component if it is a serialized dump taken from an interrupted environment. It can be implemented through RASCD_LDMP_AGAIN callback in the component.

Freeze time

If, while performing a live dump, the system is frozen for more than 100 milliseconds (0.1 seconds) an informational error is logged. It is important to keep dump callback execution paths as short as possible, especially when providing data for the dump. If we detect that the system has been frozen for 5000 milliseconds, that is 5 seconds, the dump is truncated at that point, and the system is unfrozen.

Heap allocation errors

There might be cases when a component can be tried to take dump more than the allowed limit with respect to the level. So it is the component’s responsibility to increase the private staging buffer and use multiple passes to dump more data from the component (not possible for driver running in an interrupted environment).

Example

This shows a sample kernel extension that will take a live and system dump. The important function is sample_callback(), which takes a dump using the ras_control() commands sent by the system. Note that I have only shown the handling of the dump commands. Normally, this callback would handle component trace and error checking commands as well.

Following the sample extension is a brief sequence of statements used to take a live dump of sample_comp from software.

#include <sys/types.h>
#include <sys/syspest.h>
#include <sys/uio.h>
#include <sys/processor.h>
#include <sys/systemcfg.h>
#include <sys/malloc.h>
#include <sys/ras.h>
#include <sys/livedump.h>
#include <sys/eyec.h>
#include <sys/raschk.h>
#include <sys/param.h>
#include <sys/dump.h>

/* RAS conmtrol block for the component */
ras_block_t rascb=NULL;

/* Data to include in livedump */

typedef struct sample_data {
char *dev;
int flag;
} sample_data_t;

sample_data_t *data;

/* componet callback */

kerrno_t sample_livedump_callback(ras_block_t cb, ras_cmd_t cmd, void *arg, void *priv);
void sample_initiate_livedump();

/*
* Entry point called when this kernel extension is loaded.
*
* Input:
* cmd - 1=config, 2=unconfig)
* uiop - points to the uio structure.
*/
int
sampleext(int cmd, struct uio *uiop)
{
kerrno_t rv = 0;
int rc,len;
char *comp="/dev/sample";

/* cmd should be 1 or 2 */
if (cmd == 2) {
/* Unloading */
if (rascb) ras_unregister(rascb);
xmfree(data, kernel_heap);
return(0);
}

if (cmd != 1) return(EINVAL);
/* Allocate data */

data = xmalloc(sizeof(sample_data_t), 1, kernel_heap);

if (!data) {
return(ENOMEM);
}
len = strlen(comp)+1;
data->dev=xmalloc(len, 1, kernel_heap);
strcpy(data->dev,comp);
data->flag = 0;



/* Register the component as dump aware */
rv = ras_register(&rascb, "sample_livedump", (ras_block_t)0, RAS_TYPE_FILESYSTEM
, "sample component",
RASF_DUMP_AWARE, sample_livedump_callback, NULL);
if (rv) return(KERROR2ERRNO(rv));

/* turn on component live dump */
rv = ras_control(rascb, RASCD_SET_LDMP_ON, 0, 0);
if (rv) return(KERROR2ERRNO(rv));

/* dump staging buffer space must be set up to store the dump table */
rv = ras_control(rascb, RASCD_SET_SDMP_STAGING,
(void*)(sizeof(struct cdt_nn_head)+ sizeof(struct cdt_entry)), 0);
if (rv) return(KERROR2ERRNO(rv));

/* To make persistent */
rv = ras_customize(rascb);
if (rv) return(KERROR2ERRNO(rv));

sample_initiate_livedump();

return(0);
}

/*
* Sample Callback that is called for live dump.
*
* The data to dump consists of a header and data .
*
* Input:
* cb - Contains the component's ras_block_t.
* cmd - ras_control command
* arg - command argument
* priv - private data, unused.
*/
kerrno_t
sample_livedump_callback(ras_block_t cb, ras_cmd_t cmd, void *arg, void *priv)
{
kerrno_t rv = 0;

switch(cmd) {
case RASCD_LDMP_ON: {
/* Turn live dump on. */
rv = ras_control(cb, RASCD_SET_LDMP_ON, 0, 0);
break;
}
case RASCD_LDMP_OFF: {
/* Turn live dump off. */
rv = ras_control(cb, RASCD_SET_LDMP_OFF, 0, 0);
break;
}
case RASCD_LDMP_LVL: {
/* Set livedump data level */
rv = ras_control(cb, RASCD_SET_LDMP_LVL, arg, 0);
break;
}
case RASCD_LDMP_ESTIMATE: /* fall through */
case RASCD_LDMP_PREPARE:{
/*
* The prepare call is used to request staging buffer space
* and provide an estimate of the amount of data to be dumped
*/
ldmp_prepare_t *p = (ldmp_prepare_t*)arg;
int n = 0;
/* Staging buffer used for dump table */
p->ldpr_sbufsz =sizeof(struct cdt_nn_head)+ sizeof(struct cdt_entry) ;
p->ldpr_datasize = p->ldpr_sbufsz + sizeof(sample_data_t);
break;
}
case RASCD_LDMP_START: {
/*
* This is received to provide the dump table.
* the table is an limited table here.
*/
ldmp_start_t *p = (ldmp_start_t*)arg;
struct cdt_nn_head *hp;
struct cdt_entry *ep;

hp = (struct cdt_nn_head*)p->ldmpst_buffer;
bzero(hp,sizeof(struct cdt_nn_head));
hp->cdtn_magic = DMP_MAGIC_N;
hp->cdtn_len=sizeof(struct cdt_nn_head)+ sizeof(struct cdt_entry);

ep = (struct cdt_entry*)(hp+1);
strcpy(ep->d_name, "dev1");
ep->d_len = sizeof(sample_data_t);
ep->d_ptr = &data;
ep->d_segval = DUMP_GEN_SEGVAL;
p->ldmpst_table = hp;
break;
}
case RASCD_LDMP_AGAIN:
break;
case RASCD_LDMP_FINISHED:
break;

case RASCD_DMP_PASS_THROUGH:{
/* pass through */
printf("%s\n", arg);
break;
}
default: {
printf("bad ras_control command.\n");
rv = EINVAL_RAS_CONTROL_BADCMD;
}
}

return(rv);
}

void
sample_initiate_livedump()
{
ldmp_parms_t sample_params;
kerrno_t kc,rc;
if(ldmp_setupparms(&sample_params)==0) {
sample_params.ldp_title= "sample";
sample_params.ldp_errcode = 3;
sample_params.ldp_symptom = "sam";;
sample_params.ldp_func = "func";;
if (dmp_compspec(DCF_FAILING|DCF_BYCB, rascb, &sample_params, NULL, NULL)) {
printf("Error");
}
rc=livedump(&sample_params);
if(rc!=0) {
printf("Error %d",rc);
}
} else {
printf("Error");
}
}

To include sample_comp in a live dump initiated from the command line, run the following command:

livedumpstart -C sample_comp symptom="sample dump"

Resources

1. Livedump kernel service

2. Live dump facility

3. Firmware assisted dump – progress codes

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=856325
ArticleTitle=IBM AIX RAS – firmware-assisted and live dump facilities explained
publish-date=02012013