©2003 International Business Machines Corporation. All rights reserved.
Important: Read the disclaimer before reading this article.
Informix® Dynamic Server® (IDS) operation mode refers to the operating state of IDS. IDS has four basic operation modes: off-line mode, fast recovery mode, quiescent or single user mode, and on-line mode.
- Off-line mode indicates that IDS is not running. This usually is a result of a serious system or database failure such as bad chunks, assertion failures, memory crashes, or major IDS processors that died abnormally.
- Fast recovery mode indicates IDS is undergoing physical and logical recovery.
- Quiescent mode indicates that IDS is running, but only users with DBA privileges can connect and do administrative and maintenance work.
- On-line mode indicates that IDS is running normally and available to all users to connect and perform any kind of database operation.
For 24 x 7 production systems, IDS should always be in on-line mode; any other mode indicates a serious system or database server problem. It is critical that DBAs monitor IDS operation mode to make sure IDS is running and functioning all the time. They can do this by monitoring the status of the operation mode.
The most common and the easiest way to monitor IDS operation mode is to issue the command
onstat-
at the UNIX® command prompt. The output of this command looks like the following:
Informix Dynamic Server 2000 Version 9.21.UC4 -- On-Line -- Up 01:01:17 -- 1654784 Kbytes |
The first field in the above output tells us which IDS version we are running; in this case, it is 9.21.UC4. The second field tells us which operation mode IDS is in. In this case, IDS is on-line and running normally. This field also indicates if IDS is having a checkpoint or hitting a long transaction. The third field indicates the time IDS has been up. In this case, IDS has been up for 1 hour, one minute and 17 seconds. The last field of the output tells us how much system memory IDS is currently using. Here we see that IDS is using 1.65 GB of system memory.
But there are some limitations of the command:
-
It does not check for various Informix assertion failures. IDS performs assertion checks regularly to detect database server malfunctions or problems caused by hardware or operating system errors. An assertion check is a consistency check that verifies whether physical and logical database objects such as disk pages, memory structures, tables and indexes are valid and functioning. When an assertion check detects corrupted or malfunctioning database objects, IDS will produce an assertion failure and record a description in the database server messages log.
Assertion failures can be deadly to production systems. To be safe, if IDS has an assertion failure, the DBA needs to bounce IDS and recover it to online mode.
-
The
onstatcommand also does not check for down or bad chunks. Chunks are physical disk spaces where IDS stores its database data and indexes. Bad chucks are usually caused by disk errors, I/O errors, or mirroring errors. If these errors occur and the hardware is unable to recover, IDS may not be able to read or write to database tables and indexes. IDS could still be in on-line mode if those bad chunks are not a part of critical system dbspaces, such as rootdbs or dbspaces for physical and logical logs. This situation can be very risky to production systems.
How can we make sure that IDS is not only in online mode, but also is clean of assertion failures and bad chunks? To address this problem at my company, we came up with the following script to monitor for those failures.
The script consists of two major functions: one to check assertion failures and one to check down or bad chunks. The logic is conceptually simple: check to see if IDS is online; if it is, then further check to see if IDS has had any assertion failures and any down or bad chunks since the latest initialization.
The complexity of the program lies in determining how to identify assertion failures after the last IDS initialization. As I mentioned earlier, IDS will produce assertion failure files when the failure occurs, so we can use the timestamp of those assertion failure files to compare with IDS online time. The script is implemented in Perl since Perl has good time calculation functions. Let's now take a close look at the check_af function below.
#(c) Copyright IBM Corp. 2003 All rights reserved. #
#This sample program is owned by International Business
Machines #Corporation or one of its subsidiaries
("IBM") and is copyrighted #and licensed, not sold. #
#You may copy, modify, and distribute this sample
program in any #form without payment to IBM, for any
purpose including developing, #using, marketing or
distributing programs that include or are #derivative
works of the sample program. Licenses to IBM patents
#are expressly excluded from this license. #The sample
program is provided to you on an "AS IS" basis, without
#warranty of any kind. IBM HEREBY EXPRESSLY DISCLAIMS
ALL #WARRANTIES EITHER EXPRESS OR IMPLIED, INCLUDING,
BUT NOT LIMITED TO #THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTIC- #ULAR
PURPOSE. Some jurisdictions do not allow for the
exclusion or #limitation of implied warranties, so the
above limitations or #exclusions may not apply to you.
IBM shall not be liable for any #damages you suffer as
a result of using, modifying or distributing #the
sample program or its derivatives. # #Each copy of any
portion of this sample program or any derivative #work
must include the above copyright notice and disclaimer
of #warranty.
######################################################################
######################################################################
# Script Name: check_infx_af # Description: This script
checks if your informix engine suffers # from assertion
failures and down chunks
######################################################################
#!/usr/bin/ksh
check_af()
{ days=`onstat - |awk '{print $11}'` hms=`onstat - |
awk '{print $13}'` h=`echo $hms | cut -f 1 -d ":"`
m=`echo $hms | cut -f 2 -d ":"` s=`echo $hms | cut -f 3
-d ":"`
infxtime=`expr $days \* 86400 + $h \* 3600 + $m \* 60 +
$s`
echo "informix up time" $infxtime
affile=`ls /var/tmp/af.* |tail -1` file=$affile echo
$affile
afseconds=`/usr/local/bin/perl -e ' use Time::Local;
@months=("Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"); $write_secs
= (stat($ARGV[0]))[9]; $filetime=scalar
localtime($write_secs); @timefd=(split /\s+/,
$filetime); $year=$timefd[4]-1900; for($i=0;
$i<12; $i++) { if($months[$i] eq $timefd[1])
{ $month = $i; last; } } $day=$timefd[2]; @hms=(split
/\:/, $timefd[3]);
$seconds=timelocal($hms[2],$hms[1],$hms[0],$day,$month,
$year); print $seconds, "\n"; ' $file `
echo "af file time" $afseconds
touch 1234 file1=1234
curseconds=`/usr/local/bin/perl -e ' use Time::Local;
@months=("Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"); $write_secs
= (stat($ARGV[0]))[9]; $filetime=scalar
localtime($write_secs); @timefd=(split /\s+/,
$filetime); $year=$timefd[4]-1900; for($i=0;
$i<12; $i++) { if($months[$i] eq $timefd[1])
{ $month = $i; last; } } $day=$timefd[2]; @hms=(split
/\:/, $timefd[3]);
$seconds=timelocal($hms[2],$hms[1],$hms[0],$day,$month,
$year); print $seconds, "\n"; ' $file1 `
echo "current time" $curseconds
let timediff=$curseconds-$afseconds echo "current - af
" $timediff let ok=$timediff-$infxtime echo "diff -
informix time" $ok
if [ $ok -gt 0 ] then continue; else echo "Some new
Assertion Failures\n" echo "You may need to bounce
Informix." exit 1 fi }
|
In order to identify if there have been any assertion failures after IDS last initialization, we need three times: the time that IDS is online or IDS uptime, the time for assertion failures (or rather the timestamp of assertion failure files), and thirdly the current time. We use the variable infxtime to get IDS online time and converted it into seconds for calculation. We then use the Perl functions write_secs and filetime to get the times for assertion failure files and the current time, and then just perform a simple mathematical calculation to find out if the assertion failures happened after IDS last initialization.
Another major function in the program, check_dc , is quite self explanatory:
function check_dc
{onstat -d |grep PD | awk '{print $2, $7, $8}' >
/var/tmp/downck
if [ -s /var/tmp/downck ] then echo "Following Informix
chunks are down:\n" cat /var/tmp/downck echo "You may
need to restore your databases." Exit 1 Fi
rm /var/tmp/dwonck }
function main
{ opst=`onstat - | awk '{print $8}'`
if [ "$opst"="OnLine" ] then check_af check_dc echo
"Informix is online." else echo "Informix is either
offline or having serious problems." fi }
main
|
To ensure our Informix engine is fully functioning in a 24 x 7 environment and to be proactive in managing any potential problems, we can use the UNIX crontab utility to schedule a cron job to run our script on a regular basis. We could then have an additional script to connect to our e-mail system and e-mail warnings to the DBA if there are any assertion failures or down chunks for IDS.
In addition, Informix has provided a very powerful and useful monitoring utility called ALARMPROGRAM. Although this utility does not monitor assertion failures, it automatically monitors other problems such as long transactions, instance backup failures, physical and logical recovery failures, logical log failures, data integrity problems and so on. This utility is available for IDS version 7.3x, and developed and matured in versions 9.3x and 9.4x. The utility assigns a number to each IDS operation referred to as class id and classifies all operation failures into five severity levels, one being the lowest and five the highest. If an error occurs, it uses the UNIX shell language to display appropriate error messages. We can customize the program and direct those error messages to a DBA's email or pager so when IDS fails, it can get the DBA's attention immediately. For details how to setup and customize this utility, refer to Appendix C of the Administrator's Reference .
Using the ALARMPROGRAM and a script such as the one I've described in this article, you can ensure that your DBA will be notified right away if any failures occur. You can get to work quickly on any problem that occurs. We've found this to be an effective method for assuring that our IDS database is operating properly.
This article contains sample code. IBM grants you ("Licensee") a non-exclusive, royalty free, license to use this sample code. However, the sample code is provided as-is and without any warranties, whether EXPRESS OR IMPLIED, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT. IBM AND ITS LICENSORS SHALL NOT BE LIABLE FOR ANY DAMAGES SUFFERED BY LICENSEE THAT RESULT FROM YOUR USE OF THE SOFTWARE. IN NO EVENT WILL IBM OR ITS LICENSORS BE LIABLE FOR ANY LOST REVENUE, PROFIT OR DATA, OR FOR DIRECT, INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL OR PUNITIVE DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF THE USE OF OR INABILITY TO USE SOFTWARE, EVEN IF IBM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents.
- Fan, Jianing. "Monitoring Informix Dynamic Server for Higher Performance," DB2 Developer Domain, March 2003. Available at http://www.ibm.com/developerworks/db2/zones/informix/library/techarticle/0303fan/0303fan.html .
- Nathans, Sari. "Informix Administration and Housekeeping for SAP R/3," DB2 Developer Domain, November 2003. Available at http://www.ibm.com/developerworks/db2/zones/informix/library/techarticle/0211nathans/0211nathans.html .
| Name | Size | Download method |
|---|---|---|
| check_infx_af.pl | 5.02 KB |
FTP
|
Information about download methods
-
Fan, Jianing. "Monitoring Informix Dynamic Server for Higher Performance," DB2 Developer Domain, March 2003. Available at
http://www.ibm.com/developerworks/db2/zones/informix/library/techarticle/0303fan/0303fan.html
.
-
Nathans, Sari. "Informix Administration and Housekeeping for SAP R/3," DB2 Developer Domain, November 2003. Available at
http://www.ibm.com/developerworks/db2/zones/informix/library/techarticle/0211nathans/0211nathans.html
.
Jianing Fan is a software engineer at Motorola specializing in relational database management systems. He is an Informix Certified Professional, Oracle Certified Professional, and has over 10 years of database and system experience as a developer, system administrator, and DBA. Jianing can be reached at cjf035@email.mot.com




