AIX® has an event notification mechanism using file system interfaces, which is called the AIX Event Infrastructure. Cluster Aware AIX uses this to monitor cluster events, so that the failure detection time is reduced. The events happening on one node of the cluster are notified to all nodes in the cluster, and corrective action can be easily taken.

Share:

Trishali Nayar (ntrishal@in.ibm.com), Filesystem Developer, IBM

Trishali Nayar works at the IBM India Storage Lab. She graduated from the University of Pune with a bachelor's degree in computer engineering. She was part of the development team that made the Cluster Aware AIX operating system. She has past experience in the area of distributed file systems development. She has also co-authored the IBM Redbook, Implementing NFSv4 in the Enterprise: Planning and Migration Strategies.



Cheryl L. Jennings (halllc@us.ibm.com), AIX filesystem developer, IBM

Cheryl Jennings graduated from the University of Texas at Austin with a bachelor's in computer science. She began her career with IBM in the AIX L3 Support team and currently works in the AIX Filesystem Development team.



10 August 2011

Also available in Chinese

Introduction

The new releases of AIX 6.1 TL6 and 7.1 are cluster aware. This implies that a cluster of AIX nodes can easily be created now. This is very useful for building highly available and resilient environments. Power HA is one of the first to use this new feature of AIX.

In a cluster environment, there is a need to monitor many events to keep the cluster up and running. Examples of such critical events are:

  • If a member (which we will call "node") goes down, the other nodes need to be informed immediately so that applications and clients can fail-over without disruption.
  • If a network interface goes down, traffic needs to be routed over an alternate path.
  • If a disk goes down, data needs to be made available from elsewhere.
  • If a key process/daemon is dead, there is a need to restart it.

AIX has an event notification mechanism using file system interfaces that is called the AIX Event Infrastructure.

Cluster Aware AIX uses this to monitor cluster events, so that the failure detection time is reduced. Also, the event occurrences on one node of the cluster are propagated to all nodes in the cluster and corrective action can be easily taken.

Any applications/products/services, that are run in a distributed environment on AIX, can utilize such event notifications.


Cluster events

The file system calls that need to be used by the monitoring code are open(), write(), select(), read() and close(). For cluster notifications, the additional string "CLUSTER=YES" has to be used in the write() call. The same program can be used to monitor for both local and remote events. Only the remote nodes, which are also actively monitoring the same event, will generate remote notifications.

Listing 1. Code snippet to monitor continuously
open()
write()
loop
{
     select()
     read()
}
close()2

The previous code monitors cluster events continuously. As soon as any event (local or remote) occurs, the select call returns immediately. The event data returned from the read() has a lot of useful information. It tells you which particular event occurred on which node. Event notifications are distributed over the network to all nodes of the cluster.

The following example illustrates what data is returned, when a node is added to the cluster. The local node (on which the command to add the node is executed) along with all the nodes in the cluster will be notified as follows:

Listing 2. Example of a cluster event
BEGIN_EVENT_INFO
TIME_tvsec=1271922590
TIME_tvnsec=886742634
SEQUENCE_NUM=1
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=NODE_ADD
NODE_NUMBER=1
NODE_ID=0x76497CF2CF1111DF8D83BEB25D4C4703
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

The data returned when an event occurs is in <key,value> pairs. This has various benefits. Consumer programs can search for the relevant key name and get the value from it easily.

The AIX Event Infrastructure has the capability to pass a message from an event producer to the consumer. This message varies for different event producers. A simple example is a network-related event producer wishing to send across the affected network interface's name and a disk-related event producer sending the concerned disk name.

This additional variable information is provided to the consumer between the delimiters: BEGIN_EVPROD_INFO and END_EVPROD_INFO. For all cluster events, the cluster related information is also available between these delimiters. This helps the monitoring program to identify on which node the event occurred.

The lscluster –m command gives out details about the existing cluster on a node.

Listing 3. Output of lscluster –m command
Node name: imaginary.ibm.com
        Cluster shorthand id for node: 3uuid for node: 6f5b24cc-cbab-11df-8c2c-001125085b7a
        State of node:  UP  NODE_LOCAL
        Smoothed rtt to node: 0
        Mean Deviation in network rtt to node: 0
        Number of zones this node is a member in: 0
        Number of clusters node is a member in: 1
        CLUSTER NAME       TYPE  SHID   UUID
        clust1             local        6f56fffa-cbab-11df-8c2c-001125085b7a

        Number of points_of_contact for node: 0
        Point-of-contact interface & contact state
         n/a

The information that is returned in the three highlighted fields (cluster shorthand id for node, uuid for node and the cluster UUID) help to identify uniquely a node in a cluster.

On any cluster event occurrence this information is returned in the keys NODE_NUMBER, NODE_ID and CLUSTER_ID fields respectively.

The SEQUENCE number field tells the number of times this event has occurred.

Events received from a remote node do not include user or process information or stack trace, even if the event producer supports it.


Event producers available on Cluster Aware AIX

This section gives details on each of the event producers that are only available in a clustered environment (for example, when a system is part of a cluster).

nodeList

The nodeList event producer notifies when the list of nodes in the cluster changes. Cluster membership change is a key event that is useful for all other nodes in the cluster.

The EVENT_TYPE field can have the following values:

  • NODE_ADD: Triggered when a node is added from the cluster. E.g. using chcluster command.
  • NODE_DELETE: Triggered when a node is removed from the cluster. E.g. using chcluster command.

The NODE_NUMBER and NODE_ID, help identify the concerned node in question.

Listing 4. Event output from a nodeList occurrence

BEGIN_EVENT_INFO
TIME_tvsec=1271922590
TIME_tvnsec=886742634
SEQUENCE_NUM=1
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=NODE_DELETE
NODE_NUMBER=1
NODE_ID=0x76497CF2CF1111DF8D83BEB25D4C4703
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

nodeState

The nodeState event producer monitors for the state of a node in the cluster.

The EVENT_TYPE field can have the following values:

  • NODE_UP: Triggered when a node comes up (for example, after a reboot or activation from HMC).
  • NODE_DOWN: Triggered when a node goes down (for example, on shutdown, reboot, or crash).

The NODE_NUMBER and NODE_ID help identify the concerned node in question.

Listing 5. Event output from a nodeState occurrence
BEGIN_EVENT_INFO
TIME_tvsec=1271921536
TIME_tvnsec=68254861
SEQUENCE_NUM=1
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=NODE_UP
NODE_NUMBER=2
NODE_ID=0x76497CF2CF1111DF8D83BEB25D4C4703
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

nodeAddress

The nodeAddress event producer monitors the network address of the node.

The EVENT_TYPE field can have the following values:

  • ADDRESS_ADD: Triggered when an address (alias) is added (for example, using ifconfig and chdev commands).
  • ADDRESS_DELETE: Triggered when an address (alias) is removed (for example, using ifconfig and chdev commands).

The INTERFACE_NAME of the concerned interface, along with the FAMILY, ADDRESS and NETMASK of the IP address, is also provided.

Listing 6. Event output from a nodeAddress occurrence
BEGIN_EVENT_INFO
TIME_tvsec=1271922254
TIME_tvnsec=9053410
SEQUENCE_NUM=0
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=ADDRESS_ADD
INTERFACE_NAME=et0
FAMILY=2
ADDRESS=0x0A0A0A0A
NETMASK=0xFF000000
NODE_NUMBER=2
NODE_ID=0x76497CF2CF1111DF8D83BEB25D4C4703
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

networkAdapterState

The networkAdapterState event producer monitors the network interface of a node in the cluster.

The EVENT_TYPE field can have the following values:

  • ADAPTER_UP: Triggered when an interface is brought up (for example, using ifconfig, smit tcpip, and chdev commands).
  • ADAPTER_DOWN: Triggered when an interface is brought down (for example, using ifconfig, smit tcpip, and chdev commands).
  • ADAPTER_ADD: Triggered when an interface is added (for example, using mkdev command).
  • ADAPTER_DEL: Triggered when an interface is deleted (for example, using rmdev command).

The INTERFACE_NAME field gives the name of the concerned interface.

Listing 7. Event output from a networkAdapterState occurrence
BEGIN_EVENT_INFO
TIME_tvsec=1271920539
TIME_tvnsec=399378269
SEQUENCE_NUM=1
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=ADAPTER_UP
INTERFACE_NAME=en0
NODE_NUMBER=2
NODE_ID=0x76497CF2CF1111DF8D83BEB25D4C4703
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

nodeContact

The nodeContact event producer monitors the last contact status of the node in a cluster.

The EVENT_TYPE field can have the values CONNECT_UP and CONNECT_DOWN. This event is triggered when a node is rebooted; shutdown, crashes or with commands like mkdev and rmdev.

The INTERFACE_NAME field gives the name of the concerned interface.

Listing 8. Event output from a nodeContact occurrence
BEGIN_EVENT_INFO
TIME_tvsec=1271921874
TIME_tvnsec=666770128
SEQUENCE_NUM=0
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=CONNECT_DOWN
INTERFACE_NAME=en1
NODE_NUMBER=2
NODE_ID=0x76497CF2CF1111DF8D83BEB25D4C4703
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

clDiskList

The clDiskList event producer notifies when the list of disks in the cluster changes. AIX commands, like chcluster, helps to add or remove disks in the cluster trigger this event.

In the event output, the EVENT_TYPE key has two values. Using either one, you can find out if a disk got added or deleted from the cluster.

The EVENT_TYPE field can have the following values:

  • DISK_ADD: Triggered when a disk is added to the cluster (for example, using the chcluster command).
  • DISK_DELETE: Triggered when a disk is removed from the cluster (for example, using the chcluster command).

The DISK_NAME and the DISK_UID values help identify the concerned disk.

Listing 9. Event output from a clDiskList occurrence
BEGIN_EVENT_INFO
TIME_tvsec=1271927983
TIME_tvnsec=696543410
SEQUENCE_NUM=0
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=DISK_ADD
DISK_NAME=cldisk1
DISK_UID=3E213600A0B800016726C000000FF4B8677C80F1724-100 FAStT03IBMfcp
NODE_NUMBER=2
NODE_ID=0xF079E8C801C11DFB918BEB25635B404
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

clDiskState

The clDiskState event producer monitors cluster disks.

Each node cluster must have common storage devices available, either through the Storage Area Network (SAN) or via SAS subsystems. These storage devices are either the clustered shared disks or the repository disk.

The EVENT_TYPE field can have the values DISK_UP and DISK_DOWN. This event is triggered when a disk comes up or goes down respectively.

The DISK_NAME helps identify the concerned cluster disk. A global device view exists across all the nodes in the cluster, which provides a single global device name for a disk from any node in the cluster (for example, cldisk1 refers to the same physical disk from any node in the cluster).

Listing 10. Event output from a clDiskState occurrence
BEGIN_EVENT_INFO
TIME_tvsec=1271935734
TIME_tvnsec=265210314
SEQUENCE_NUM=1
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=DISK_DOWN
DISK_NAME=cldisk1
NODE_NUMBER=2
NODE_ID=0x76497CF2CF1111DF8D83BEB25D4C4703
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

repDiskState

The repDiskState event producer monitors for the repository disk. The cluster repository disk is used as the central repository for all the cluster configuration data.

The EVENT_TYPE field can have the values REP_UP and REP_DOWN when a repository disk comes up or goes down respectively.

The DISK_NAME returns the repository disk name.

Listing 11. Event output from a repDiskState occurrence
BEGIN_EVENT_INFO
TIME_tvsec=1271933757
TIME_tvnsec=134003703
SEQUENCE_NUM=1
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=REP_UP
DISK_NAME=caa_private0
NODE_NUMBER=3
NODE_ID=0x76497CF2CF1111DF8D83BEB25D4C4703
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

diskState

The diskstate event producer monitors for local disk changes. This event is notified only for disks that are supported by the storage framework. This event is not transmitted to other nodes in the cluster and is notified only to the node on which the event actually occurred.

The EVENT_TYPE field can have the values LOCAL_UP and LOCAL_DOWN when a local disk comes up or goes down respectively.

The DISK_NAME helps identify the concerned local disk.

Listing 12. Event output from a diskState occurrence
BEGIN_EVENT_INFO
TIME_tvsec=1271935029
TIME_tvnsec=958362343
SEQUENCE_NUM=1
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=LOCAL_UP
DISK_NAME=hdisk1
NODE_NUMBER=2
NODE_ID=0x76497CF2CF1111DF8D83BEB25D4C4703 
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

vgState

The vgState event producer verifies the status of the Volume Group on a disk.

Whenever a local (registered with diskState) or cluster (shared and repository) disk up or down event happens a corresponding VG_UP and VG_DOWN event is triggered for the volume group residing on that disk. Using this event producer, an application can verify the status of a Volume Group on the disk, with the LVM subsystem. Commands like varyonvg and varyoffvg generate this event.

It also passes the concerned disk name and volume group name in the DISK_NAME and VG_NAME fields.

Listing 13. Event output from a vgState occurrence
BEGIN_EVENT_INFO
TIME_tvsec=1271915408
TIME_tvnsec=699408296
SEQUENCE_NUM=0
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=VG_UP
DISK_NAME=hdisk5
VG_NAME=myvg1
NODE_NUMBER=2
NODE_ID=0x76497CF2CF1111DF8D83BEB25D4C4703 
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

General event producers

When a system is part of a cluster, other event producers (like modFile, modDir, utilFs, waitTmCPU, waitersFreePg, waitTmPgInOut, vmo, schedo, and processMon) also support remote notifications.

Only an additional string CLUSTER=YES needs to be passed in the write() system call to send and receive remote notifications for these events. The event also needs to be monitored on other nodes to receive remote notifications.

The following example illustrates what is returned when the file /etc/passwd is modified on one node of the cluster. The local node where the event occurred will be notified as follows:

Listing 14. Example of a "local" cluster event
Event corresponding to /aha/fs/modFile.monFactory/etc/passwd.mon has occurred.
BEGIN_EVENT_INFO
TIME_tvsec=1285753875
TIME_tvnsec=613266683
SEQUENCE_NUM=0
PID=8192002
UID=0
UID_LOGIN=0
GID=0
PROG_NAME=vi
RC_FROM_EVPROD=1000
BEGIN_EVPROD_INFO
NODE_NUMBER=2
NODE_ID=0x31E6CEAECA3711DF89F6BEB25D4C4703
CLUSTER_ID=0xAD3903B4CB9B11DF90E9BEB25635B404
END_EVPROD_INFO
STACK_TRACE
ahafs_evprods+70C
aha_process_vnop+160
vnop_rdwr+7DC
vno_rw+B4
rwuio+100
rdwr+188
kewrite+104
.svc_instr
write+1A4
putfile+154
wop+154
commands+1F8C
vmain+154
vop+4AC
commands+1E44
main+7E4
__start+68
END_EVENT_INFO

All nodes of the cluster, which are monitoring for this event, are also immediately notified.

Listing 15. Example of a "remote" cluster event
Event corresponding to /aha/fs/modFile.monFactory/etc/passwd.mon has occurred.
BEGIN_EVENT_INFO
TIME_tvsec=1285753875
TIME_tvnsec=538355111
SEQUENCE_NUM=0
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
NODE_NUMBER=2
NODE_ID=0x31E6CEAECA3711DF89F6BEB25D4C4703
CLUSTER_ID=0xAD3903B4CB9B11DF90E9BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

Conclusion

Cluster Aware AIX events can be monitored using the AIX event infrastructure. This is integrated in AIX releases 6.1 TL6 and 7.1. Users or administrators that need such event notifications can utilize this framework and benefit from it. Monitoring event notifications can also help improve health of your systems.

Resources

Learn

Get products and technologies

  • Try out IBM software for free. Download a trial version, log into an online trial, work with a product in a sandbox environment, or access it through the cloud. Choose from over 100 IBM product trials.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=751812
ArticleTitle=Monitoring events in an AIX Cluster
publish-date=08102011