How to read detailed data of error-log entries

This article aims at explaining the method of reading detailed data of error-log entries using C programs. As you read, you will be introduced to various functions, structures, and constructs to be used for this and an example code to end the article.

Prateek Goel (pragoel1@in.ibm.com), Staff Software Engineer, IBM

photo of Prateek GoelPrateek has been working with the IBM AIX® reliability, availability, and serviceability (RAS) features development team. He has written articles on ProbeVue in IBM developerWorks® and IBM Systems Magazine forums.



11 February 2014

Introduction

Error-logging is a facility using which an operating system module or an user application can log any detected errors. These messages are written in order to identify the failing component, associated reason for the same, and any additional information. This set of information is aimed to help understand the reason for the failure of a component or any unexpected behavior. However, it is important to note that you cannot solely rely on this because this is a first failure data capture mechanism only. For example, if a user realizes that a connection to a disk has failed with an error-log entry, then it indicates the reason why writes might be failing for an application. An error-log entry of low paging space is very common and this entry indicates the user to increase paging space, as otherwise, the system might behave unexpectedly.

Because error logging is a serviceability mechanism, you should place them wisely with the required information so that it can clearly indicate what is intended. At the same time, care should be taken so that there in no sudden flood of entries that can negatively impact the search of error entries.

Logging an entry

Users can log an error entry using the following two mechanisms:

  • Using a function: From user applications, you can use the errlog function and from the kernel extensions, you can use the errsave function to log an error entry.

    Syntax:

    int errlog ( ErrorStructure,  Length)
    void *ErrorStructure;
    unsigned int Length;
  • Using a command: Using the errorlogger command, you can log an entry.

    Syntax:

    errlogger Message

Reading the logged entries

The framework provides an error report tool, errpt. This tool provides various ways of looking at a report and filter. You can find more information about this in the Resources section. As we have seen earlier, users can write anything using the error-log entries, including structures and data buffers, that can help them in debugging. However, the errpt tool can just dump the whole information in standard data types such as hexadecimal, American Standard Code for Information Interchange (ASCII) and so on. In the later sections, we would see how we can write "C" code to fetch the error-log entries and rebuild the dumped structures and buffers, and dump the data in a more meaningful way for efficient and effective debugging.

Basics to reading error-log entries

An error-log entry consists of various attributes and value pairs. Some of the attributes to list are error-log identifier, label, probable cause, detailed data and so on. The detailed data attribute is aimed to equip users to dump the required data for ease of servicing the failed component. However, if a user dumps the structures containing some vital information, the errpt tool cannot comprehend it and the onus is on the owner of the error-log entry to do the conversion from raw data to an easily understandable format by mapping it to the corresponding structures. To achieve the above stated, error-logging framework provides a set of application programming interfaces (APIs) and constructs.

The error-logging framework writes the entries in a sorted order based on time. These entries are written in the binary format. A defined structure and construct is required to read and drive meaning out of these. You can use the following approach to search and read the entries from the error-log file and then map the detailed data section to user defined structures and drive more meaning out of these error-log entries.

Finding the location of the error-logging file

Use the following command to get the location of the error-logging file.

# /usr/lib/errdemon -l
Error Log Attributes
--------------------------------------------
Log File                /var/adm/ras/errlog
Log Size                1048576 bytes
Memory Buffer Size      32768 bytes
Duplicate Removal       true
Duplicate Interval      10000 milliseconds
Duplicate Error Maximum 1000

The file that is listed against the Log File tag is the complete path to the error-logging file.

Functions to read error-log entries

  • errlog_open: This API is used to open the log file to start reading the entries logged so far. Following is the syntax for the same.

    Syntax:

    int errlog_open(path, mode, magic, handle)

    Table 1: Parameter details of the errlog_open function

    ArgumentTypeRemarks
    pathchar *Contains the absolute path of error-log file.
    modeintIs the same as the modes used in the open-system subroutine.
    magicunsigned intDetermines the version of the errlog_entry_t stucture to use. The value of this parameter must be set to LE_MAGIC. The sys/errlog.h file contains the definition of the LE_MAGIC that is being used for the current version of IBM® AIX®.
    handleerrlog_handle_t *Acts as a return value, and contains the handle of the opened error-log file if successful.

    Table 2: Return value details of the errlog_open function

    Return valueMeaning
    0Successful.
    LE_ERR_INVARGA parameter passed was invalid.
    LE_ERR_NOFILEError-log file does not exist.
    LE_ERR_NOMEMCould not allocate the required memory.
    LE_ERR_IOAn I/O error had occurred.
    LE_ERR_INVFILEThe file is not a valid error-log file.
  • errlog_close: This function is used to close the error-log file whose error-log handle is passed as an argument. This handle must be the same as the one returned by the errlog_open API.
    int errlog_close(handle)
    errlog_handle_t handle;

    A return value of “0” indicates success in closing the error-log file. In case of an error, LE_ERR_INVARG is returned indicating that the argument passed is invalid.

  • errlog_find_first: This subroutine finds the first entry that matches the given filtering criteria.
    int errlog_find_first(handle, filter, result)

    Table 3: Parameter details of the errlog_find_first function

    ArgumentTypeRemarks
    handleerrlog_handle_tThis is the handle returned by the errlog_open subroutine.
    filtererrlog_match_t *This defines the filter to be used to search the entries.
    resulterrlog_entry_t *When an entry matching the filter is found, the memory area pointed by this parameter is filled with that error-log entry.

    Table 4: Return value details of the errlog_find_first function

    Return valueMeaning
    0Successful.
    LE_ERR_INVARGA parameter passed was invalid.
    LE_ERR_DONEReached the end of the error-log file while searching. In other words, no match is found after the previous invocation of this API. If, this was the first invocation, then it means that there are no entries matching the criteria.
    LE_ERR_NOMEMCould not allocate the required memory.
    LE_ERR_IOAn I/O error had occurred.
  • errlog_find_next: Meaning of the parameters remains the same as that of the errlog_find_first API.
    int errlog_find_next(handle, result)

    Return value meaning remains the same as that of the errlog_find_first API.

  • errlog_find_sequence:
    int errlog_find_sequence(handle, sequence, result)

    Table 5: Parameter details of the errlog_find_sequence function

    ArgumentTypeRemarks
    handleerrlog_handle_tIt is the handle returned by the errlog_open subroutine.
    sequenceintThis parameter specifies the sequence number of the error-log entry.
    resulterrlog_entry_t *When an entry matching the filter is found, the memory area pointed by this parameter is filled with that error-log entry.

    Return value meaning remains the same as that of the errlog_find_first API.

  • errlog_set_direction:
    int errlog_set_direction(handle, direction)

    Table 6: Parameter details of the errlog_set_direction function

    ArgumentTypeRemarks
    handleerrlog_handle_tIt is the handle returned by the errlog_open subroutine.
    directionint

    This parameter specifies the direction in which you can search for the entries. Possible values include:

    LE_FORWARD: To search in forward direction
    LE_REVERSE: To search in reverse direction

    A return value of “0” indicates success in setting the direction to search for the error-log file. In case of an error, LE_ERR_INVARG is returned indicating that the argument passed is invalid.

Structures used

In this section, we can look at the structures to be used while attempting to read the error-log entries and building the search/filter criteria.

  • Structure for errlog_entry as defined in the /usr/include/sys/errlog.h file

    When an entry is found matching the filter criteria, the following error-log entry is returned in a form in the following structure. Using the members of this structure, users can access all the details of the error-log entry.

    typedef struct errlog_entry {
        unsigned int        el_magic;
        unsigned int        el_sequence;
        char                el_label[LE_LABEL_MAX];
        unsigned int        el_timestamp;
    /* few of them skipped */
        char                el_machineid[LE_MACHINE_ID_MAX];
        char                el_nodeid[LE_NODE_ID_MAX];
        char                el_class[LE_CLASS_MAX];
        char                el_type[LE_TYPE_MAX];
        char                el_resource[LE_RESOURCE_MAX];
        char                el_rclass[LE_RCLASS_MAX];
        char                el_rtype[LE_RTYPE_MAX];
    /* few of them skipped */
        unsigned short      el_detail_length;
        char                el_detail_data[LE_DETAIL_MAX];  /* this is important */
    /*few of them skipped */
    } errlog_entry_t;

    Most of the fields above el_detail_data can be used for searching an entry in error logging. el_detail_data stores the data that is passed with the struct err_rec0 structure in errlog or errsave APIs. If we map to the structure that we used to write, we should be able to get the required data.

  • Structure used to specify the searching criteria

    typedef struct errlog_match {
        unsigned int                em_op;
        union {
            struct errlog_match     *emu_left;
            unsigned int            emu_field;
        } emu1;
        union {
            struct errlog_match     *emu_right;
            unsigned int            emu_intvalue;
            unsigned char           *emu_strvalue;
        } emu2;
    } errlog_match_t;

    The above structure is used to specify the filtering/searching criteria in the errlog_find_first API.

    In emu2 union, the two fields: emu_intvalue and emu_strvalue are used to specify the integral or string type values.

    Fields specified in emu_field will pick the value from the error-log entry and apply the operator specified in emu_intvalue or emu_strvalue depending on the field type selected with emu_field.

    sys/errlog.h does contain some predefined values to make these inner fields easily accessible, and you can refer to it for more details.

Building the search criteria

Search criteria could be visualized as binary tree of the form depicted below.

L2:			 Operator [ em_op]
			 /		\
		[ emu_left]		[emu_right]
L1:	       operator1		     operator2
	     [ em_op]				[ em_op ]
	     /	    \				/	\
Leaf:	emu_field   emu_intvalue	emu_field	emu_strvalue

The following table specifies the type of operators that can be used at each level depicted in the above binary tree.

Table 7: Various node levels and their significance
Node levelRemarks
Leaf level nodesThese nodes contain only the error-log field and value as operands to the L1 node operator.
L1 level nodesThese nodes specify the relational operators such as greater than, equal to, less than, and so on.
L2 level nodeAt this level, this node will have logical operators such as AND, OR, and so on.

In summary, only relational operators can be used with the error-log entries and values. The result of these relational operators can be combined to form complex search criteria using logical operators. Let’s look at the various operators that can be used.

  • Relational operators:

    The following table shows some of the relational operators that can be used to build the searching/filtering criteria. These are specified in the em_op field of errlg_match_t. These operators work on the leaf nodes only.

    Table 8: Relational operators and their meaning

    OperatorMeaning
    LE_OP_EQUALCheck if the left leaf node (error-log entry field value) is equal to the right leaf node value.
    LE_OP_NECheck if the left leaf node (error-log entry field value) is not equal to the right leaf node value.
    LE_OP_SUBSTRCheck if the left leaf node (error-log entry field value) contains the substring specified in the right leaf node value.
    LE_OP_LTCheck if the left leaf node (error-log entry field value) is less than the right leaf node.
    LE_OP_LECheck if the left leaf node (error-log entry field value) is less than or equal to the right leaf node value.
    LE_OP_GTCheck if the left leaf node (error-log entry field value) is greater than the right leaf node value.
    LE_OP_GECheck if the left leaf node (error-log entry field value) is greater than or equal to the right leaf node value.
  • Logical operators:

    The following set of logical operators work on non leaf nodes only.

    Table 9: Logical operators and their meaning

    OperatorMeaning
    LE_OP_ANDApplies the logical AND operator on the left and the right nodes.
    LE_OP_ORApplies the logical OR operator on the left and the right nodes.
    LE_OP_XORApplies the logical XOR operator on the left and the right nodes.
    LE_OP_NOTApplies the logical NOT operator only on the left node.

You can use the following tags for specifying the error-log entry field that can be used as the left operand to relational operators.

Table 10: Tags representing their corresponding error-log entry fields
em_field valuesMeaning
LE_MATCH_SEQUENCETo use the error-log entry's Sequence field as the operand.
LE_MATCH_LABELTo use the error-log entry's Label field as the operand.
LE_MATCH_TIMESTAMPTo use the error-log entry's Timestamp field as the operand.
LE_MATCH_MACHINEIDTo use the error-log entry's MachineID field as the operand.
LE_MATCH_NODEIDTo use the error-log entry's NodeID field as the operand.
LE_MATCH_CLASSTo use the error-log entry's Class field as the operand.
LE_MATCH_TYPETo use the error-log entry's Type field as the operand.
LE_MATCH_RESOURCETo use the error-log entry's Resource field as the operand.
LE_MATCH_RCLASSTo use the error-log entry's Rclass (resource class) field as the operand.
LE_MATCH_RTYPETo use the error-log entry's Rtype (resource type) field as the operand.

Example

The following high-level approach can be used to read the error-log entries.

  • Open the error-log file.
  • Build a filter to search the entries you are interested in.
  • Search the error-log entry based on the filter that is built in the previous step. If found, it returns the error-log entry, else a failure code.
  • After analyzing the required error-log entries, close the error-log file.

Entry logged:

errlogger "I am from IBM"

The "C" program to read the detailed data, "I am from IBM":

#include <fcntl.h>
#include <stdio.h>
#include <sys/errlog.h>
main()
{
        /* error log file handle */
        errlog_handle_t my_errlog_hndl;
        /* mode to open the error file */
        int mode = O_RDONLY;
        int magic = LE_MAGIC;
        /* path of error log file */
        char path[]="/var/adm/ras/errlog";
        int rc=0;
        /* error log entry matching/finding criteria */
        errlog_match_t match_resource_name;
        /* error log entry details of matched entry */
        errlog_entry_t matched_errlog_entry;

        /* This example looks for entries logged by OPERATOR type resource */
        char resource_name[]="OPERATOR";

        /* opening error log file */
        rc=errlog_open(path,mode,magic,&my_errlog_hndl);
        if ( rc )
        {
                printf(" Failed to open error log file error : %d\n",rc);
                exit(1);
        }
        /*
          building matching criteria
          criteria is :
          if, el_resource field in errlog_entry_t structure is equal to OPERATOR value
         */
        match_resource_name.em_op=LE_OP_EQUAL;
        match_resource_name.emu1.emu_field=LE_MATCH_RESOURCE;
        match_resource_name.emu2.emu_strvalue=resource_name;

        /* find the first entry */
        rc=errlog_find_first(my_errlog_hndl,&match_resource_name,&matched_errlog_entry);
        if ( rc == LE_ERR_DONE )
        {
                printf(" Did not find any entry matching the criteria.\n");
        }
        else if ( rc )
        {
                printf(" Failed to find error log entry : %d\n",rc);
        }
        else
        {
                /* keep looking for all entries , break when done or error occurs*/
                while( !rc )
                {
                        /* print the detailed data */
                        /* One can print other details of error log entry too */
                        /* Even the detail_data pointer could be typecasted to actual
                           structure and print the values as structure fields
                         */
                        printf("error log entries detail data is : %s\n",
						matched_errlog_entry.el_detail_data);

                        /* find next entries after first has been found */
                        rc=errlog_find_next(my_errlog_hndl,&matched_errlog_entry);
                }
                if ( rc == LE_ERR_DONE )
                {
                        printf(" No more entries found.\n");
                }
                else
                {
                        printf(" Failed to find error log entry : %d\n",rc);
                }
        }
        /* close the error log file */
        rc=errlog_close(my_errlog_hndl);
        if ( rc )
                printf(" Failed to close error log file error : %d\n",rc);
}

You can compile the program using the following command:

cc read_errlog_entries.c -lerrlog

Resources

Learn

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=962097
ArticleTitle=How to read detailed data of error-log entries
publish-date=02112014