One of the most important activities in the daily work of a DBA is monitoring the server to prevent problems. IBM' Informix Dynamic Server' (IDS). is one of the most scalable database servers available today and one which requires little administrative overhead compared with other database servers playing in this high-end league. However, as the number of database server installations increases and the daily business of your company depends on the smooth operation of these servers, monitoring and observing their critical functions is absolutely essential.
This article discusses the monitoring of critical areas of IDS. Besides the tools delivered with the database server (ISA and AGS ServerStudio), there are several vendor tools that you can use for this purpose. However many tools do not have database monitoring as their primary focus, and for those that are database monitoring tools, the cost may be an issue. In this article we'll talk about a tool that's available for free and can help you solve many of your monitoring problems.
BusyBee is a database monitoring tool you can download for free, written completely in the versatile and powerful Python programming language. It's sole purpose is the observation and monitoring of Informix Dynamic Server.
Let's start by looking at BusyBee's architecture, shown in Figure 1.
Figure 1. BusyBee architecture
BusyBee is based on a multiprocess architecture where each subprocess (domain inspector) is responsible for observing a dedicated set of functionality which I will refer to as a "domain" of your database server. These domain inspectors act as daemon processes in the background and are configured thru an XML-based configuration file. Different intervals and individual thresholds can be established, allowing a flexible and easy customization of BusyBee serving your dedicated environment.
The BbWatch master daemon parses the XML configuration file and forks a child process (domain inspector) for each enabled domain. Each domain inspector observes a dedicated area of the database server, examining any violations of the configured thresholds and generating appropriate alerts. Depending on the individual domain type, different tools or interfaces will be called to gather the required information:
- Calling the ONSTSAT utility
- Performing sysmaster queries
-
Parsing the
online.log
The interval of each domain inspector can be individually configured; thus the more critical domains (like 'DbmStatusCheck') can be observed more often than the less critical ones (like 'SessionLockCheck').
The XML configuration file allows the customization of the individual domain inspectors to meet the requirements of your dedicated environment. Figure 2 shows an excerpt of the XML configuration file describing the two domain inspectors 'DbspaceFullCheck' and 'CheckPointCheck'.
Figure 2. XML configuration file
Each IBM Informix Dynamic Server instance must have an individual XML configuration file named as follows:
<InstallDir>/BusyBee/etc/BbWatchIfxCfg.<instance>.xml |
If you wish to share the XML configuration file between different instances, you can create the appropriate symbolic links.
Next we'll talk about the individual XML tags and their respective meanings.
<Domain Name='xxx'>
This is the name of an individual domain. Domains are the single areas of the database manager that will be observed by an individual BbWatch daemon (domain inspector).
<Enabled>
This XML tag controls whether or not a domain is activated (enabled). If you want to deactivate a certain domain, you can change this setting to 0 (disabled). If a domain is disabled, no domain inspector will be started, thus this database manager resource will not be observed.
<Interval>
This XML tag controls the interval for this domain. The specific area of the database manager will be checked at the given interval.
<Severity>
This XML tag is only available for domains which are not based on any thresholds. For example the domain 'DbmStatusCheck' is not based on thresholds because the database server is either down or up. You can configure the appropriate severity for this event here. Possible values for the severity in the order of relevance are:
- Info
- Warning
- Minor
- Critical
- Fatal
The configured severity will also be delivered to the external script (see 'ExecuteScript' below).
<ExecuteScript>
This XML tag may contain the full path of an external script which will be automatically triggered by BbWatch if any of the configured thresholds has been exceeded. The external script will be called by BbWatch with the following arguments:
Instance The first argument is the name of database manger instance where the event occurred. Domain The second argument is the name of the domain inspector which triggered this event. Severity The third argument is the severity code (i.e. WARNING, MINOR, CRITICAL, FATAL). Message The fourth argument is the full message text.
Examples:
|
This mechanism allows you to write your own scripts in your preferred programming language so you can receive those events and divert them to other interfaces like SMS, pager, email or existing system management platforms.
<Warning><Minor><Critical><Fatal>
These XML tags allow the configuration of individual thresholds. BbWatch will trigger an alarm with the given severity as soon as any one of these thresholds is exceeded.
<Group>
Certain domains contain an additional XML tag called 'Group' which allows a more granular configuration for this domain.
For example the domain 'DbspaceFullCheck' allows the specification of dbspace groups to enable the configuration of different threshold values for individual groups of dbspaces. Another example using the 'Group' tag is the domain 'OnlineLogCheck'. Individual search patterns can be configured in different groups, each one with its own severity.
The domain inspectors are the workhorses of BusyBee. They observe an individual domain of the database server at the arranged interval and generate appropriate alarms if a configured threshold is exceeded. Figure 3 shows one of the most important domain inspectors called 'DbspaceFullCheck' which is responsible for checking how full individual dbspaces are.
Figure 3. Domain inspector
As soon as any one of the configured thresholds is reached, BbWatch will write an appropriate alarm to the logfile (
<InstallDir>/BusyBee/log/BbWatch.<instance>.log
):
- 60% (Warning)
- 70% (Minor)
- 80% (Critical)
- 90% (Fatal)
In addition to the generated event, BbWatch will also trigger the defined external script. You might for example setup a script which automatically adds a new diskspace to the affected dbspace in order to prevent any outages.
Here is a listing of all twenty domain inspectors currently available for the IBM Informix Dynamic Server. Beside the purpose of the individual domain inspector you will also see the reason why observing this resource is essential.
DbmStatusCheck
| Purpose: | Check the availibility of the database server using the 'onstat' utility. |
| Reason: | If your database server is down, you could not access your data. This might have a negative impact on your business. |
LongTxCheck
| Purpose: | Check to see if there are any ongoing long transactions. |
| Reason: | Long transactions should be avoided. They consume major resources of your database server. If a long transaction exclusive high water mark (LTXEHWM) is reached, all other write activities will be blocked. |
ConnectCheck
| Purpose: | Make sure that a connect thru 'dbaccess' is possible to any server names defined in your $ONCONFIG file (i.e. DBSERVERNAME, DBSERVERALIASES ). |
| Reason: | The previous domain inspector 'DbmStatusCheck' checks the availability of your database server. However the fact that your database server is 'online' does not always ensure that you could connect to your defined servernames. So this additional test is helpful in some special situations. |
DbspaceFullCheck
| Purpose: | Check the filling degree of your dbspaces according the thresholds defined in the XML config file. |
| Reason: | Managing disk space is one of the primary tasks of a DBA. If one of your dbspaces runs out of space, some applications could stop working. You can configure discrete thresholds for individual dbspaces, thus more critical dbspaces (like the 'rootdbs') could be set up with lower threshold values to allow an earlier detection of such situations. Static dbspaces (like 'physdbs' or 'logdbs') could be excluded from any observation. |
CheckpointCheck
| Purpose: |
Parse the
online.log
for any violations of the configured maximum checkpoint time allowed.
|
| Reason: | Long checkpoints should be avoided. During a checkpoint all write activities will be blocked. So the goal for every DBA is to keep the checkpoint time as small as possible. This domain inspector also allows the configuration of distinct timeframes, so you could configure a bigger threshold value for your batch processing time (for example from hour 20-7) and a smaller threshold value for your interactive user time (for example from hour 8-19). |
OnlineLogCheck
| Purpose: |
Parse the
online.log
and compare each row to the configured search patterns from the XML configuration file and perform an alert for any matches that have been found.
|
| Reason: |
The
online.log
is an important information resource for the DBA. It contains useful information about any problems that have occurred. With this domain inspector you are able to configure individual search patterns which will be automatically checked for you by the BbWatch daemon.
|
LogFullCheck
| Purpose: | Checks to see how full the logical logs are and generates appropriate alarms if one of the configured thresholds is exceeded. |
| Reason: |
The logical logs are one of the most important resources of the database server. Your database server will be blocked if they become full because they could not be backed up. You might also lose transactions in the case of a failure if your logical logs have not been backed up. Thus observing the behaviour of the logical logs is also one of the primary activities of a DBA.
Yes, we have the dynamic log allocation feature since IDS 9.30. But this is not always a good thing. Poorly written applications performing large transactions without any commit could trigger the allocation of several additional logs scattered throughout your dbspaces if you have set DYNAMIC_LOGS to 2 (which is also the default). All these dbspaces will then be considered 'critical' (because they contain a logical log), so if one of them is marked down for some reason, your database server will stop. |
MaxExtentCheck
| Purpose: | Checks the number of extents for a table or a table fragment. |
| Reason: | The number of extents per table is limited. The upper limit depends on the page size of your system (2K or 4K) and the number of special data types in your table (like datetime or interval). For a 2K system the limit will be approximately 200-220 extents. If this limit is reached, no more inserts into this table will be possible. You have reorganize the relevant table which might be time consuming if the table is large (which is obvious because it ran out of extents). |
MaxPageCheck
| Purpose: | Counts the number of pages allocated for a table or a table fragment. |
| Reason: | There is an upper limit of 16,000,000 pages per table/fragment/index in Informix Dynamic Server. You cannot insert any additional rows when the limit is reached. You must reorganize the table (for example splitting a single table into several fragments) before work can continue. 16,000,000 pages is not that much. On a system using a 2K page size it is around 32 GB. I've seen some customers here in Germany which ran into that limit on large tables which have not been fragmented. |
SessionLockCheck
| Purpose: | Inspects the number of locks held by individual database sessions. |
| Reason: |
Poorly written applications can consume a huge number of locks. This domain inspector checks the number of locks held by each database session and triggers appropriate alarms if one of the configured thresholds is exceeded.
Yes, we have had the dynamic lock table since IBM Informix Dynamic Server 9.21. However the disadvantage of this feature is that you might no longer detect those poorly-written applications consuming many locks and slowing down the performance of your database server. This domain inspector helps you identify those applications. |
VirtualSegmentCheck
| Purpose: | Counts the number of allocated virtual shared memory segments. |
| Reason: | Allocating additional shared memory segments is resource intensive and on some platforms (like HP/UX) degrades the overall performance of your database server. There are also some limits on the amount of shared memory that can be allocated, for example on 32-Bit AIX systems. |
DbspaceDownCheck
| Purpose: | Checks if any dbspaces have been marked down. |
| Reason: | You cannot access the data in 'down' dbspaces. You may need to restore the data from an archive. |
ChunkDownCheck
| Purpose: | Checks if any chunks are marked down. |
| Reason: | You cannot access the data in 'down' chunks. You may need to restore the data from an archive. |
SessionMemoryCheck
| Purpose: | Inspects the allocated memory of each database session currently connected. |
| Reason: | Depending on the number of prepared statements, recursive stored procedures and some other factors, the allocated memory per session might be high, especially if these sessions fail to free their prepared statements. With a large number of users, you might suddenly hit the known memory limitations on 32-bit systems. This domain inspector lets you identify those memory-intensive sessions. |
ForegroundWriteCheck
| Purpose: | Checks the number of foreground writes that have occurred. |
| Reason: | Foreground writes should be avoided. They slow down the performance of your database server. Depending on the nature of your application (batch or more interactive) you should prefer chunk writes or lru writes, so that foreground writes almost never happen. |
DeadLockCheck
| Purpose: | Counts the number of deadlocks that have occurred. |
| Reason: | Deadlocks should also be avoided. They normally result from an inappropriate application design. However I've seen also a lot of deadlocks because people forgot to create their tables in 'row' lock mode (the default is 'page'). |
ReadCacheCheck
| Purpose: | Checks the current read cache rate of your database server. |
| Reason: | The read cache rate is an indicator of the overall performance of your database server. The goal for OLTP systems is too keep the read cache rate over 95 percent. This domain inspector allows the specification of a 'WarmUpTime' that has to be elapsed before the read cache check will be measured. This 'WarmUpTime' refers to the time since the database server has been started or since the last 'onstat -z' (zero profile) has been executed. This ensures that the computed read cache rate is representative. |
WriteCacheCheck
| Purpose: | Checks the current write cache rate of your database server. |
| Reason: | Almost the same as the above described 'ReadCacheRate'. The goal for OLTP systems is to keep the write cache rate over 85 percent. However depending on the nature of the application (for example write-intensive batch applications) this might not always be possible. |
SequentialCacheCheck
| Purpose: | Checks the number of sequential scans that occurred on individual tables. |
| Reason: | Sequential scans on large tables should normally be avoided in OLTP systems. In data warehouse environments they are quite common. This domain inspector allows you to specify the minimum size that a table must have in order to be considered. This helps you exclude small tables in which sequential scans are not a problem. Also the explicit exclusion of tables through their names can be done, because you might also have large tables which are processed sequentially, for example by batch jobs. |
ReadyQueueCheck
| Purpose: | Counts the number of threads currently waiting on the ready queue. |
| Reason: |
Threads will be put on the ready queue because all available CPU virtual processors (VPs) are currently busy. They wait for a free virtual processor, hence the database session is blocked. A constantly large number of threads on the ready queue ('onstat -g rea'), leads to the conclusion that you have to add additional CPU VPs.
However if you have already configured the same number of CPU VPs as physical processors available in the machine, you might have to add more physical processors or investigate other performance tuning activities. This domain inspector allows you to specify the number of samples that should be taken before computing the average number of threads on the ready queue. This ensures that the computed value is more representative. |
Integration with existing system management tools
BusyBee offers the ability to call an external program which can be configured thru the XML tag 'ExecuteScript' if anyone of the thresholds has been exceeded. This is comparable to the Informix 'alarmprogram' or the DB2® UDB 'userexit' mechanism.
- Domain name (like 'DbspaceFullCheck' or 'CheckpointCheck')
- Severity (Warning, Minor, Critical, Fatal)
- Instance name (the name of the Informix instance)
- Message text (The full text of the BusyBee message)
Through this mechanism it should be fairly easy to connect BusyBee with existing system management tools like BigBrother, CA Unicenter, IBM Tivoli® or HP/OpenView. If you don't have such a system management framework in use, you could also setup a simple script which sends you an e-mail or an SMS alert if a threshold has been exceeded. There are two sample shell scripts delivered with BusyBee (
BbWatchMail.ksh
and
BbWatchSyslog.ksh
) which will show you how to deliver events to external interfaces like email or thru the UNIX® syslog mechanism.
These external scripts/programs need not to be written as a shell script. They could also be programmed in C, Java, Perl or Python or any other language which is capable of receiving command line arguments and delivering return codes.
BusyBee is available on all UNIX based platforms where Python can be ported. For popular platforms like AIX, HP/UX, Linux and Solaris there are binary distributions of Python. BusyBee supports the following database servers on these platforms:
- IBM Informix IDS 7.x
- IBM Informix IDS 9.x
- IBM DB2 Universal Database (UDB) 7.x (available in Q3/2003)
- IBM DB2 UDB 8.x (available in Q3/2003)
Currently I'm working on a port of BusyBee for the IBM DB2 UDB database server (on UNIX/ LINUX platforms). The next step will probably be a port for the open source database system MySQL .
The following URL provides you with additional information about BusyBee and the capability to download it:
http://www.herber-consulting.de/BusyBeeThere is also some other Informix stuff on my website which might be interesting for you. If you have any questions about BusyBee feel free to contact me via e-mail.

Eric Herber started his IT career at Garmhausen & Partner, then joined Informix Advanced Support and now has been working as a freelancer for several years. His main consulting areas are Informix, DB2, Perl and Python. He loves playing and watching soccer. You can reach Eric at eric@herber-consulting.de




