Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Monitoring events in an AIX Cluster

Trishali Nayar (ntrishal@in.ibm.com), Filesystem developer, IBM
Trishali Nayar works at the IBM India Storage Lab. She graduated from the University of Pune with a bachelor's degree in computer engineering. She was part of the development team that made the Cluster Aware AIX operating system. She has past experience in the area of distributed file systems development. She has also co-authored the IBM Redbook, Implementing NFSv4 in the Enterprise: Planning and Migration Strategies.
Cheryl L. Jennings (halllc@us.ibm.com), AIX filesystem developer, IBM
Cheryl Jennings graduated from the University of Texas at Austin with a bachelor's in computer science. She began her career with IBM in the AIX L3 Support team and currently works in the AIX Filesystem Development team.

Summary:  AIX® has an event notification mechanism using file system interfaces, which is called the AIX Event Infrastructure. Cluster Aware AIX uses this to monitor cluster events, so that the failure detection time is reduced. The events happening on one node of the cluster are notified to all nodes in the cluster, and corrective action can be easily taken.

Date:  10 Aug 2011
Level:  Introductory PDF:  A4 and Letter (41KB | 13 pages)Get Adobe® Reader®
Also available in:   Chinese

Activity:  12937 views
Comments:  

Introduction

The new releases of AIX 6.1 TL6 and 7.1 are cluster aware. This implies that a cluster of AIX nodes can easily be created now. This is very useful for building highly available and resilient environments. Power HA is one of the first to use this new feature of AIX.

In a cluster environment, there is a need to monitor many events to keep the cluster up and running. Examples of such critical events are:

  • If a member (which we will call "node") goes down, the other nodes need to be informed immediately so that applications and clients can fail-over without disruption.
  • If a network interface goes down, traffic needs to be routed over an alternate path.
  • If a disk goes down, data needs to be made available from elsewhere.
  • If a key process/daemon is dead, there is a need to restart it.

AIX has an event notification mechanism using file system interfaces that is called the AIX Event Infrastructure.

Cluster Aware AIX uses this to monitor cluster events, so that the failure detection time is reduced. Also, the event occurrences on one node of the cluster are propagated to all nodes in the cluster and corrective action can be easily taken.

Any applications/products/services, that are run in a distributed environment on AIX, can utilize such event notifications.


Cluster events

The file system calls that need to be used by the monitoring code are open(), write(), select(), read() and close(). For cluster notifications, the additional string "CLUSTER=YES" has to be used in the write() call. The same program can be used to monitor for both local and remote events. Only the remote nodes, which are also actively monitoring the same event, will generate remote notifications.


Listing 1. Code snippet to monitor continuously
open()
write()
loop
{
     select()
     read()
}
close()2

The previous code monitors cluster events continuously. As soon as any event (local or remote) occurs, the select call returns immediately. The event data returned from the read() has a lot of useful information. It tells you which particular event occurred on which node. Event notifications are distributed over the network to all nodes of the cluster.

The following example illustrates what data is returned, when a node is added to the cluster. The local node (on which the command to add the node is executed) along with all the nodes in the cluster will be notified as follows:


Listing 2. Example of a cluster event
BEGIN_EVENT_INFO
TIME_tvsec=1271922590
TIME_tvnsec=886742634
SEQUENCE_NUM=1
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=NODE_ADD
NODE_NUMBER=1
NODE_ID=0x76497CF2CF1111DF8D83BEB25D4C4703
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

The data returned when an event occurs is in <key,value> pairs. This has various benefits. Consumer programs can search for the relevant key name and get the value from it easily.

The AIX Event Infrastructure has the capability to pass a message from an event producer to the consumer. This message varies for different event producers. A simple example is a network-related event producer wishing to send across the affected network interface's name and a disk-related event producer sending the concerned disk name.

This additional variable information is provided to the consumer between the delimiters: BEGIN_EVPROD_INFO and END_EVPROD_INFO. For all cluster events, the cluster related information is also available between these delimiters. This helps the monitoring program to identify on which node the event occurred.

The lscluster –m command gives out details about the existing cluster on a node.


Listing 3. Output of lscluster –m command
Node name: imaginary.ibm.com
        Cluster shorthand id for node: 3
        uuid for node: 6f5b24cc-cbab-11df-8c2c-001125085b7a
        State of node:  UP  NODE_LOCAL
        Smoothed rtt to node: 0
        Mean Deviation in network rtt to node: 0
        Number of zones this node is a member in: 0
        Number of clusters node is a member in: 1
        CLUSTER NAME       TYPE  SHID   UUID
        clust1             local        6f56fffa-cbab-11df-8c2c-001125085b7a

        Number of points_of_contact for node: 0
        Point-of-contact interface & contact state
         n/a

The information that is returned in the three highlighted fields (cluster shorthand id for node, uuid for node and the cluster UUID) help to identify uniquely a node in a cluster.

On any cluster event occurrence this information is returned in the keys NODE_NUMBER, NODE_ID and CLUSTER_ID fields respectively.

The SEQUENCE number field tells the number of times this event has occurred.

Events received from a remote node do not include user or process information or stack trace, even if the event producer supports it.


Event producers available on Cluster Aware AIX

This section gives details on each of the event producers that are only available in a clustered environment (for example, when a system is part of a cluster).

nodeList

The nodeList event producer notifies when the list of nodes in the cluster changes. Cluster membership change is a key event that is useful for all other nodes in the cluster.

The EVENT_TYPE field can have the following values:

  • NODE_ADD: Triggered when a node is added from the cluster. E.g. using chcluster command.
  • NODE_DELETE: Triggered when a node is removed from the cluster. E.g. using chcluster command.

The NODE_NUMBER and NODE_ID, help identify the concerned node in question.

Listing 4. Event output from a nodeList occurrence

BEGIN_EVENT_INFO
TIME_tvsec=1271922590
TIME_tvnsec=886742634
SEQUENCE_NUM=1
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=NODE_DELETE
NODE_NUMBER=1
NODE_ID=0x76497CF2CF1111DF8D83BEB25D4C4703
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

nodeState

The nodeState event producer monitors for the state of a node in the cluster.

The EVENT_TYPE field can have the following values:

  • NODE_UP: Triggered when a node comes up (for example, after a reboot or activation from HMC).
  • NODE_DOWN: Triggered when a node goes down (for example, on shutdown, reboot, or crash).

The NODE_NUMBER and NODE_ID help identify the concerned node in question.


Listing 5. Event output from a nodeState occurrence
BEGIN_EVENT_INFO
TIME_tvsec=1271921536
TIME_tvnsec=68254861
SEQUENCE_NUM=1
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=NODE_UP
NODE_NUMBER=2
NODE_ID= 0x76497CF2CF1111DF8D83BEB25D4C4703
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

nodeAddress

The nodeAddress event producer monitors the network address of the node.

The EVENT_TYPE field can have the following values:

  • ADDRESS_ADD: Triggered when an address (alias) is added (for example, using ifconfig and chdev commands).
  • ADDRESS_DELETE: Triggered when an address (alias) is removed (for example, using ifconfig and chdev commands).

The INTERFACE_NAME of the concerned interface, along with the FAMILY, ADDRESS and NETMASK of the IP address, is also provided.


Listing 6. Event output from a nodeAddress occurrence
BEGIN_EVENT_INFO
TIME_tvsec=1271922254
TIME_tvnsec=9053410
SEQUENCE_NUM=0
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=ADDRESS_ADD
INTERFACE_NAME=et0
FAMILY=2
ADDRESS=0x0A0A0A0A
NETMASK=0xFF000000
NODE_NUMBER=2
NODE_ID= 0x76497CF2CF1111DF8D83BEB25D4C4703
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

networkAdapterState

The networkAdapterState event producer monitors the network interface of a node in the cluster.

The EVENT_TYPE field can have the following values:

  • ADAPTER_UP: Triggered when an interface is brought up (for example, using ifconfig, smit tcpip, and chdev commands).
  • ADAPTER_DOWN: Triggered when an interface is brought down (for example, using ifconfig, smit tcpip, and chdev commands).
  • ADAPTER_ADD: Triggered when an interface is added (for example, using mkdev command).
  • ADAPTER_DEL: Triggered when an interface is deleted (for example, using rmdev command).

The INTERFACE_NAME field gives the name of the concerned interface.


Listing 7. Event output from a networkAdapterState occurrence
BEGIN_EVENT_INFO
TIME_tvsec=1271920539
TIME_tvnsec=399378269
SEQUENCE_NUM=1
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=ADAPTER_UP
INTERFACE_NAME=en0
NODE_NUMBER=2
NODE_ID= 0x76497CF2CF1111DF8D83BEB25D4C4703
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

nodeContact

The nodeContact event producer monitors the last contact status of the node in a cluster.

The EVENT_TYPE field can have the values CONNECT_UP and CONNECT_DOWN. This event is triggered when a node is rebooted; shutdown, crashes or with commands like mkdev and rmdev.

The INTERFACE_NAME field gives the name of the concerned interface.


Listing 8. Event output from a nodeContact occurrence
BEGIN_EVENT_INFO
TIME_tvsec=1271921874
TIME_tvnsec=666770128
SEQUENCE_NUM=0
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=CONNECT_DOWN
INTERFACE_NAME=en1
NODE_NUMBER=2
NODE_ID= 0x76497CF2CF1111DF8D83BEB25D4C4703
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

clDiskList

The clDiskList event producer notifies when the list of disks in the cluster changes. AIX commands, like chcluster, helps to add or remove disks in the cluster trigger this event.

In the event output, the EVENT_TYPE key has two values. Using either one, you can find out if a disk got added or deleted from the cluster.

The EVENT_TYPE field can have the following values:

  • DISK_ADD: Triggered when a disk is added to the cluster (for example, using the chcluster command).
  • DISK_DELETE: Triggered when a disk is removed from the cluster (for example, using the chcluster command).

The DISK_NAME and the DISK_UID values help identify the concerned disk.


Listing 9. Event output from a clDiskList occurrence
BEGIN_EVENT_INFO
TIME_tvsec=1271927983
TIME_tvnsec=696543410
SEQUENCE_NUM=0
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=DISK_ADD
DISK_NAME=cldisk1
DISK_UID=3E213600A0B800016726C000000FF4B8677C80F1724-100 FAStT03IBMfcp
NODE_NUMBER=2
NODE_ID=0xF079E8C801C11DFB918BEB25635B404
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

clDiskState

The clDiskState event producer monitors cluster disks.

Each node cluster must have common storage devices available, either through the Storage Area Network (SAN) or via SAS subsystems. These storage devices are either the clustered shared disks or the repository disk.

The EVENT_TYPE field can have the values DISK_UP and DISK_DOWN. This event is triggered when a disk comes up or goes down respectively.

The DISK_NAME helps identify the concerned cluster disk. A global device view exists across all the nodes in the cluster, which provides a single global device name for a disk from any node in the cluster (for example, cldisk1 refers to the same physical disk from any node in the cluster).


Listing 10. Event output from a clDiskState occurrence
BEGIN_EVENT_INFO
TIME_tvsec=1271935734
TIME_tvnsec=265210314
SEQUENCE_NUM=1
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=DISK_DOWN
DISK_NAME=cldisk1
NODE_NUMBER=2
NODE_ID= 0x76497CF2CF1111DF8D83BEB25D4C4703
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

repDiskState

The repDiskState event producer monitors for the repository disk. The cluster repository disk is used as the central repository for all the cluster configuration data.

The EVENT_TYPE field can have the values REP_UP and REP_DOWN when a repository disk comes up or goes down respectively.

The DISK_NAME returns the repository disk name.


Listing 11. Event output from a repDiskState occurrence
BEGIN_EVENT_INFO
TIME_tvsec=1271933757
TIME_tvnsec=134003703
SEQUENCE_NUM=1
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=REP_UP
DISK_NAME=caa_private0
NODE_NUMBER=3
NODE_ID= 0x76497CF2CF1111DF8D83BEB25D4C4703
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

diskState

The diskstate event producer monitors for local disk changes. This event is notified only for disks that are supported by the storage framework. This event is not transmitted to other nodes in the cluster and is notified only to the node on which the event actually occurred.

The EVENT_TYPE field can have the values LOCAL_UP and LOCAL_DOWN when a local disk comes up or goes down respectively.

The DISK_NAME helps identify the concerned local disk.


Listing 12. Event output from a diskState occurrence
BEGIN_EVENT_INFO
TIME_tvsec=1271935029
TIME_tvnsec=958362343
SEQUENCE_NUM=1
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=LOCAL_UP
DISK_NAME=hdisk1
NODE_NUMBER=2
NODE_ID= 0x76497CF2CF1111DF8D83BEB25D4C4703 
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO

vgState

The vgState event producer verifies the status of the Volume Group on a disk.

Whenever a local (registered with diskState) or cluster (shared and repository) disk up or down event happens a corresponding VG_UP and VG_DOWN event is triggered for the volume group residing on that disk. Using this event producer, an application can verify the status of a Volume Group on the disk, with the LVM subsystem. Commands like varyonvg and varyoffvg generate this event.

It also passes the concerned disk name and volume group name in the DISK_NAME and VG_NAME fields.


Listing 13. Event output from a vgState occurrence
BEGIN_EVENT_INFO
TIME_tvsec=1271915408
TIME_tvnsec=699408296
SEQUENCE_NUM=0
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
EVENT_TYPE=VG_UP
DISK_NAME=hdisk5
VG_NAME=myvg1
NODE_NUMBER=2
NODE_ID= 0x76497CF2CF1111DF8D83BEB25D4C4703 
CLUSTER_ID=0x6EA7B08888D811DFB918BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO


General event producers

When a system is part of a cluster, other event producers (like modFile, modDir, utilFs, waitTmCPU, waitersFreePg, waitTmPgInOut, vmo, schedo, and processMon) also support remote notifications.

Only an additional string CLUSTER=YES needs to be passed in the write() system call to send and receive remote notifications for these events. The event also needs to be monitored on other nodes to receive remote notifications.

The following example illustrates what is returned when the file /etc/passwd is modified on one node of the cluster. The local node where the event occurred will be notified as follows:


Listing 14. Example of a "local" cluster event
Event corresponding to /aha/fs/modFile.monFactory/etc/passwd.mon has occurred.
BEGIN_EVENT_INFO
TIME_tvsec=1285753875
TIME_tvnsec=613266683
SEQUENCE_NUM=0
PID=8192002
UID=0
UID_LOGIN=0
GID=0
PROG_NAME=vi
RC_FROM_EVPROD=1000
BEGIN_EVPROD_INFO
NODE_NUMBER=2
NODE_ID=0x31E6CEAECA3711DF89F6BEB25D4C4703
CLUSTER_ID=0xAD3903B4CB9B11DF90E9BEB25635B404
END_EVPROD_INFO
STACK_TRACE
ahafs_evprods+70C
aha_process_vnop+160
vnop_rdwr+7DC
vno_rw+B4
rwuio+100
rdwr+188
kewrite+104
.svc_instr
write+1A4
putfile+154
wop+154
commands+1F8C
vmain+154
vop+4AC
commands+1E44
main+7E4
__start+68
END_EVENT_INFO

All nodes of the cluster, which are monitoring for this event, are also immediately notified.


Listing 15. Example of a "remote" cluster event
Event corresponding to /aha/fs/modFile.monFactory/etc/passwd.mon has occurred.
BEGIN_EVENT_INFO
TIME_tvsec=1285753875
TIME_tvnsec=538355111
SEQUENCE_NUM=0
RC_FROM_EVPROD=0
BEGIN_EVPROD_INFO
NODE_NUMBER=2
NODE_ID=0x31E6CEAECA3711DF89F6BEB25D4C4703
CLUSTER_ID=0xAD3903B4CB9B11DF90E9BEB25635B404
END_EVPROD_INFO
END_EVENT_INFO


Conclusion

Cluster Aware AIX events can be monitored using the AIX event infrastructure. This is integrated in AIX releases 6.1 TL6 and 7.1. Users or administrators that need such event notifications can utilize this framework and benefit from it. Monitoring event notifications can also help improve health of your systems.


Resources

Learn

Get products and technologies

  • Try out IBM software for free. Download a trial version, log into an online trial, work with a product in a sandbox environment, or access it through the cloud. Choose from over 100 IBM product trials.

Discuss

About the authors

Trishali Nayar works at the IBM India Storage Lab. She graduated from the University of Pune with a bachelor's degree in computer engineering. She was part of the development team that made the Cluster Aware AIX operating system. She has past experience in the area of distributed file systems development. She has also co-authored the IBM Redbook, Implementing NFSv4 in the Enterprise: Planning and Migration Strategies.

Cheryl Jennings graduated from the University of Texas at Austin with a bachelor's in computer science. She began her career with IBM in the AIX L3 Support team and currently works in the AIX Filesystem Development team.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=751812
ArticleTitle=Monitoring events in an AIX Cluster
publish-date=08102011
author1-email=ntrishal@in.ibm.com
author1-email-cc=mmccrary@us.ibm.com
author2-email=halllc@us.ibm.com
author2-email-cc=mmccrary@us.ibm.com

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers