An introduction to event monitoring using the AIX Event Infrastructure

The AIX® Event Infrastructure is an extensible framework for monitoring multiple types of system events. This article gives an overview of the monitoring interface, as well as pointers for writing event monitoring applications.

Share:

Cheryl L. Jennings (halllc@us.ibm.com), AIX filesystem developer, IBM China

Cheryl Jennings graduated from the University of Texas at Austin with a bachelor's in computer science. She began her career with IBM in the AIX L3 Support team and currently works in the AIX Filesystem Development team.



Trishali Nayar (ntrishal@in.ibm.com), Filesystem Developer, IBM China

Trishali Nayar works at the IBM India Storage Lab. She graduated from the University of Pune with a bachelor's degree in computer engineering. She was part of the development team that made the Cluster Aware AIX operating system. She has past experience in the area of distributed file systems development. She has also co-authored the IBM Redbook, Implementing NFSv4 in the Enterprise: Planning and Migration Strategies.



17 May 2011

Also available in Chinese

Introduction

The AIX Event Infrastructure is an event monitoring framework for monitoring predefined and user defined system events. Within the context of the AIX Event Infrastructure, an event is defined as a change in state or in value which can be detected within the kernel or a kernel extension at the time the change occurs.

Each type of event, which may be monitored, is associated with an event producer. Event producers are simply sections of code which can detect an event as it happens and notify the AIX Event Infrastructure of event occurrences. Some examples of available event producers are:

  • modFile: The modFile event producer monitors for modifications to the content of files.
  • utilFs: The utilFs event producers monitors the utilization of a file system.
  • waitTmPgInOut: The waitTmPgInOut event producer monitors for the average wait time, in milliseconds, of threads waiting for page in or page out operations to complete over a one second period.

For a full listing of available event producers, see the Resources section.

Events are represented as files within a pseudo file system. Existing file system interfaces (read(), write(), select(), and so on) are used to specify how and when monitoring applications (also called consumers) should be notified and to wait on and read data about event occurrences. The path name to a monitor file within the AIX Event Infrastructure file system is used to determine which event a consumer wishes to monitor.

Figure 1. Example instance of a mounted AIX Event Infrastructure file system
Diagram of an instance of a mounted AIX Event Infrastructure file system

Inside the AIX Event Infrastructure file system, there are four basic file types:

  1. Monitor factories: Monitor factories are directories with the ".monFactory" extension. They are directory representations of event producers. These directories are automatically created when the AIX Event Infrastructure file system is mounted.
  2. Monitor files: Monitor files are designated by a ".mon" extension and represent events to be monitored. They only exist under a monitor factory which represents their associated event producer.
  3. List files: List files are special data files that end with the ".list" extension. Currently only one list file exists, evProds.list. Reading this file will return the list of all available event producers.
  4. Subdirectories: Subdirectories are used for ease of management and to represent the full pathname of the event.

Monitoring events with the AIX Event Infrastructure

The AIX Event Infrastructure is contained in the bos.ahafs fileset on AIX 6.1 TL 6 and AIX 7.1. To monitor events, first install the bos.ahafs fileset and mount an instance of the AIX Event Infrastructure file system:

	mkdir /aha
	mount -v ahafs /aha /aha

You may mount the file system on any desired mount point. Examples in this article assume it has been mounted on /aha.

Determining which monitor file to use

Once the filesystem is mounted, you must determine the pathname of the monitor file that corresponds to the event you wish to monitor. Each event producer has a different set of instructions for determining the pathname you should use. For additional information, see:

In the monitoring example that we will follow in the next section, we will be monitoring for modifications to the /etc/passwd file. To determine the pathname for the monitor file, we need to determine which event producer corresponds to this event. Since /etc/passwd is a regular file, the modFile event producer should be used to monitor for modifications to the contents of the file. According to the modFile documentation, "a monitor file with the same path as the file you wish to monitor should be created under the modFile.monFactory directory." So, assuming the AIX Event Infrastructure file system is mounted at /aha, the full pathname for the monitor file would be: /aha/fs/modFile.monFactory/etc/passwd.mon

It should be noted that monitor files can have the same pathname starting from the associated monitor factory and represent different events. For example, the file /aha/fs/modDir.monFactory/home/cherylfs.mon monitors for file creation and deletion within the /home/cherylfs directory. The file /aha/fs/utilFs.monFactory/home/cherylfs.mon monitors for the utilization of the filesystem /home/cherylfs.

Example event monitoring flow

The following is a high level view of a typical event monitoring flow. In this example, the /etc/passwd file is monitored by an event consumer (monitoring application). This flow illustrates the actions taken by both the event consumer and the AIX Event Infrastructure when an event is monitored.

Figure 2. Example event monitoring flow
Diagram of an event monitoring flow

At this point, the monitoring application is asleep until an event occurrence is detected. The AIX Event Infrastructure watches file operations to see if a monitored file is modified.

Figure 3. Example event monitoring flow continued
Diagram of an event monitoring flow continued

As seen in the previous example, the typical flow for a monitoring application is:

  1. Set up event monitoring using the write() system call.
  2. Wait on event occurrences with the select() (or a blocking read()) system call.
  3. Read event occurrence data through a read() system call.
  4. Parse event occurrence data and take appropriate action.
  5. If desired, wait for further event occurrences.

Each step is examined in this article. A sample program, mon_modFile_event.c, has been provided as an example you can download.


Setting up event monitoring

Once the AIX Event Infrastructure file system has been mounted and the appropriate monitor file identified, the monitor file must be created. This may be done with an open() call with the O_CREAT flag specified. See Resources for information on creating monitor files.

Once the monitor file has been created and opened, specifications on how and when to be notified must be written. A complete listing of the available options are available in the found in Resources section (see "Writing to the monitor file").

While most of the monitoring specifications are straightforward, the AIX Event Infrastructure has two behaviors for the notify count (NOTIFY_CNT) specification:

Figure 4. NOTIFY_CNT differences
Diagram of a NOTIFY_CNT differences

In the case of NOTIFY_CNT=-1, if another event occurrence is detected while the monitoring application is not currently blocked in a select() or read() call, that event occurrence data is logged in the buffer allocated for this consumer. Once the monitoring application attempts another select() or blocking read(), those calls will return immediately since there is unread event occurrence data waiting in the buffer.

In the sample program, mon_modFile_event.c, the file to be monitored through the modFile event producer is passed in as an argument from the user. If necessary, subdirectories are created in the AIX Event Infrastructure file system to create the necessary monitor file.

Once the monitor file is opened, the string "CHANGED=YES;WAIT_TYPE=WAIT_IN_SELECT;INFO_LVL=1" is written to the monitor file since:

  1. The modFile event producer has specified the AHAFS_THRESHOLD_STATE capability (CHANGED=YES).
  2. The mon_modFile_event program will wait in a select() call (WAIT_TYPE=WAIT_IN_SELECT).
  3. The modFile event producer does not pass a message, and there is no need for the stack trace in this program (INFO_LVL=1).

The mon_modFile_event program monitors continuously for events. Since the default value for NOTIFY_CNT is -1, this does not need to be specified.


Waiting on event occurrences

Monitoring applications may monitor for more than one event, and multiple applications may monitor the same event with different monitoring specifications.

It is important to note that monitoring of events does not begin until the monitoring application issues a select() or a blocking read() call. There are several conditions which cause select() or a blocking read() to return. These conditions are listed in the AIX 6.1 information center (see Waiting on events).

The sample program blocks in a select() call once the monitoring information is written to the monitor file. If the select() call returns an error, a special flag is passed to the parsing function to indicate that the error format will be used in the output.

Unavailable event occurrences

For some event producers, there may be some types of event occurrences that cause monitored events to become invalid. Some examples are:

  • The unmounting of a monitored file system through the utilFs event producer.
  • The removing or renaming of a monitored file through the modFile event producer.
  • The death of a process monitored through the processMon or pidProcessMon event producer.

Once an unavailable event occurrence has occurred, users may not monitor that event until it becomes available again. Ideally, the monitoring application identifies unavailable event occurrences and takes corrective action. This will cause the event to become valid again. The documentation for each event producer lists, which return codes, indicates that an unavailable event occurrence has been detected. The return codes for event producers are defined in sys/ahafs_evProds.h.

For local unavailable event occurrences, the monitor file associated with the event is deleted. Monitoring applications may read event data from deleted monitor files while the file descriptor for the monitor file is still open but may not block for further event occurrences. Once the monitoring application has taken corrective action and the event is valid again, the following actions must be taken to resume monitoring:

  1. The file descriptor for the deleted monitor file must be closed.
  2. The monitor file must be re-opened with the O_CREAT flag.
  3. Monitoring specifications must be written to the file.

At this point, the monitoring application may wait for event occurrences again in select() or read().

The sample program inspects the return code from the event producer (RC_FROM_EVPROD) to determine if the event occurrence was an unavailable event occurrence. It attempts corrective action for some of the possible unavailable event occurrence types and ceases monitoring for others.


Reading event occurrence data

Event data consists of <keyword>=<value> pairs and the data collected depends on the capabilities of the event producer and the INFO_LVL specified by the monitoring application. Capabilities for each event producer are listed in the event producer's documentation.

Event data may only be read once and no more than one event's worth of data is returned in a single read call. For example, say that the data for two event occurrences have been copied into the buffer before the consumer reads from the monitor file, and the event data for each event occurrence has 256 bytes worth of data. If the consumer calls read() for 4096 bytes, only the 256 bytes of the first event is returned to the user. A second read call needs to be performed to obtain the data from the second event.

In the sample program, event data is always read with a buffer of 4K. This is the recommended read size since most event occurrence data will be less than 4K. Reading with this large of a buffer means that no partial event occurrences will be returned due to insufficient space in the buffer passed to the read() call.

Examples of event occurrence data

For an event producer which has specified AHAFS_THRESHOLD_STATE and AHAFS_STKTRACE_AVAILABLE and passes a message to the event consumers, the three levels of output look like this:

INFO_LVL=1 INFO_LVL=2 INFO_LVL=3
BEGIN_EVENT_INFO
TIME_tvsec=1269863383
TIME_tvnsec=455993143
SEQUENCE_NUM=0
PID=6947038
UID=0
UID_LOGIN=0
GID=0
PROG_NAME=cat
RC_FROM_EVPROD=1000
END_EVENT_INFO
BEGIN_EVENT_INFO
TIME_tvsec=1269863383
TIME_tvnsec=455993143
SEQUENCE_NUM=0
PID=6947038
UID=0
UID_LOGIN=0
GID=0
PROG_NAME=cat
RC_FROM_EVPROD=1000
BEGIN_EVPROD_INFO
event producer message here
END_EVPROD_INFO
END_EVENT_INFO
BEGIN_EVENT_INFO
TIME_tvsec=1269863383
TIME_tvnsec=455993143
SEQUENCE_NUM=0
PID=6947038
UID=0
UID_LOGIN=0
GID=0
PROG_NAME=cat
RC_FROM_EVPROD=1000
BEGIN_EVPROD_INFO
event producer message here
END_EVPROD_INFO
STACK_TRACE:
ahafs_prod_callback+3C4
ahafs_cbfn_wrapper+30
ahafs_vn_write+204
vnop_rdwr+7E4
vno_rw+B4
rwuio+12C
rdwr+184
kewrite+16C
.svc_instr
write+1A4
_xwrite+6C
_xflsbuf+B0
__flsbuf+9C
copyopt_ascii+2C0
scat+388
main+11C
__start+68
END_EVENT_INFO

For an event producer which has specified AHAFS_THRESHOLD_VALUE_HI and has not specified AHAFS_STKTRACE_AVAILABLE and passes a message to event consumers, the three levels of output look like this:

INFO_LVL=1 INFO_LVL=2 INFO_LVL=3
BEGIN_EVENT_INFO
TIME_tvsec=1269866715
TIME_tvnsec=16678418
SEQUENCE_NUM=0
CURRENT_VALUE=3
RC_FROM_EVPROD=1000
END_EVENT_INFO
BEGIN_EVENT_INFO
TIME_tvsec=1269866715
TIME_tvnsec=16678418
SEQUENCE_NUM=0
CURRENT_VALUE=3
RC_FROM_EVPROD=1000
BEGIN_EVPROD_INFO
event producer message here
END_EVPROD_INFO
END_EVENT_INFO
BEGIN_EVENT_INFO
TIME_tvsec=1269866715
TIME_tvnsec=16678418
SEQUENCE_NUM=0
CURRENT_VALUE=3
RC_FROM_EVPROD=1000
BEGIN_EVPROD_INFO
event producer message here
END_EVPROD_INFO
END_EVENT_INFO

If there is an error from the event producer, all event producers have the following format for all INFO_LVLs:

	BEGIN_EVENT_INFO
	TIME_tvsec=1269868036
	TIME_tvnsec=966708948
	SEQUENCE_NUM=0
	RC_FROM_EVPROD=19
	END_EVENT_INFO

If a consumer is monitoring a value event and the current value already exceeds the requested threshold, the following format is used to record this EALREADY event:

	BEGIN_EVENT_INFO
	TIME_tvsec=1281837726
	TIME_tvnsec=446010404
	SEQUENCE_NUM=0
	CURRENT_VALUE=70
	RC_FROM_EVPROD=56
	END_EVENT_INFO

Each event producer provides information on what is included in the output for event occurrences they produce. The return codes for event producers are defined in sys/ahafs_evProds.h.

The sample program checks for the error format in the event occurrence data if the select() call returned an error. Otherwise, it uses the format included in the modFile event producer's documentation. It uses sscanf() to read in the values for the corresponding keywords.

The SEQUENCE_NUM keyword

Sequence numbers are maintained per event, per consumer. The same event occurrence may have different sequence numbers for different consumers, depending on when the consumers began monitoring the event.

The sequence number is reset to 0 following any cessation in monitoring. The following image illustrates the behavior of the SEQUENCE_NUM keyword.

Figure 4. Two consumers monitoring the same event
Diagram of two consumers monitoring the same event

BUF_WRAP and EVENT_OVERFLOW Keywords

Event data is kept in a circular buffer per consumer, per event monitored. If event data is being written faster than the consumer can read it, there is a possibility of a buffer wrap. If a buffer wrap occurs, event data will be overwritten such that there is no partial event data returned through a read() call. Figure 5 illustrates what the buffer looks like before and after a buffer wrap:

Figure 5. Buffer wrap condition
Diagram of a buffer wrap

In a buffer wrap condition, the first read returns only the keyword BUF_WRAP. The second read returns the event data for the next, whole event occurrence. The SEQUENCE_NUM field should be consulted to see how many event occurrences may have been overwritten from a buffer wrap. If monitoring applications experience buffer wrap conditions, they may increase the size of their monitoring buffer specified in BUF_SIZE. This cannot be done dynamically.

If the consumer is using a very small buffer, there is a possibility that the event data from one event occurrence may be larger than the buffer. In this case, the keyword EVENT_OVERFLOW will be written to the buffer, with as much event occurrence data as can fit inside the buffer. The first read in an overflow case returns only the keyword EVENT_OVERFLOW. The next read returns the event data that was able to fit in the buffer.

If a wrap condition occurred with this overflow condition, the first read returns the keyword BUF_WRAP, the second read returns the keyword EVENT_OVERFLOW and the third read returns the event data that was able to fit inside the buffer.

The sample program checks to see if the event data passed in contains the BUF_WRAP keyword. A warning is printed to the user if a buffer wrap condition has occurred. The sample program does not check for EVENT_OVERFLOW since it specifies an INFO_LVL of 1 and uses the default buffer size of 4K. Currently, the maximum size of a modFile event with an INFO_LVL of 1 will be less than 4K.

Duplicate event data consolidation

To reduce the occurrence of buffer wraps, the AIX Event Infrastructure consolidates duplicate, unread events. When new event data is collected, it is compared to the most recent unread event data. If the data from each event occurrence is exactly the same (except the timestamp and sequence number), the timestamp and sequence number of the previously written event data are updated to reflect this new, duplicate event occurrence.

Duplicate event data consolidation can be distinguished from buffer wrap conditions in that the keyword BUF_WRAP will not be read in the consolidation case.

The sample program maintains its own internal sequence number. It compares the expected value to the actual value read from the event occurrence data to determine if duplicate events were consolidated. This internal sequence number is reset to 0 if an unavailable event was detected.


Download

DescriptionNameSize
Program monitors for modificationsmon_modFile_event.c10KB

Resources

Learn

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=658836
ArticleTitle=An introduction to event monitoring using the AIX Event Infrastructure
publish-date=05172011