Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

DB2/Informix and open source: Database defense against the dark political arts

Real-time monitoring of your database for career enhancement

Marty Lurie (lurie@us.ibm.com), Information Technology Specialist, IBM, Software Group
Photo: Marty Lurie
Marty Lurie started his computer career generating chads while attempting to write Fortran on an IBM 1130. His day job is in WebSphere Systems Engineering at IBM, but if pressed he will admit he mostly plays with computers. His favorite program is the one he wrote to connect his Nordic Track to his laptop (the laptop lost two pounds, and lowered its cholesterol by 20%). Marty is an IBM-certified Advanced WebSphere Administrator, Certified DB2 DBA, Certified Business Intelligence Solutions Professional, Informix-certified Professional, Linux+ Certified, and has trained his dog to play basketball. You can contact Marty at lurie@us.ibm.com.
Aron Y. Lurie (aron.lurie@gmail.com), 9th Grader/Webmaster, Newton South High School/Hebrew College
Aron Lurie was midway through 10th grade while writing the article. He has had a web development business for 5 years, and ever since 4th grade he has been teaching himself new languages. He is the Webmaster for his school newspaper, and has been Webmaster for his USY chapter and the Hebrew College. In his spare time, Aron is a member of the Newton South ski racing team. Unfortunately, he has not yet found a way to combine computers with skiing.

Summary:  A down system is one of the most stressful events for system administrators. This article shows you how to create a real-time monitor for your database so you can be alerted when problems arise, and also provide valuable information to others about the status of the database servers. The focus should be on fixing the problem, not the blame. This is also an asset for application and network administrators for rapid diagnosis of what has caused the system outage. Complete source code is provided for the monitor, including the PHP pages to publish the database status.

Date:  21 Dec 2006
Level:  Introductory
Also available in:   Chinese  Russian  Japanese

Activity:  9264 views
Comments:  

QuickStart for the impatient

  1. Download and un-tar the sample code, found in the Downloads section.
  2. Setup open database connectivity (ODBC) to point to the database you wish to monitor.
  3. Configure PHP to graph using the JpGraph package.
  4. Modify the "goqc" script to fit your environment.
  5. Use JMeter or Rational® Performance tester to provide a load for the database.
  6. Configure a remote agent to contrast local connection performance to network performance.

Political dark arts: The blame game

DBAs take great pride in providing reliable database services to their organizations. It comes as no surprise they are very stressed when the database is blamed for performance issues or application outages. This article shows you how to provide a real-time graph of database status (see Figure 1) to reduce false accusations, and identify problems for rapid resolution.


Figure 1. Real-time database status
Database Status graph

The X-axis is time, the left Y-axis is response time in seconds, and the right Y-axis is the number of user connections. Figure 1 clearly illustrates that despite a heavy load of almost 300 users, the database is holding up quite well with response times quicker than 1.5 seconds. Since this simulation is running on a single laptop with only one disk, the performance is outstanding. With this type of graph available in your organization, it will show that there no issue with the database. Conversely, if there is something broken, you'll know about it sooner and will be able to find the root cause that much faster.

Database quality control and statistical sampling

To produce the graph in Figure 1, you need data about the status of the server. Gathering the data and making it available through a Web page is a simple step in Statistical Process Control. For more information on Statistical Process Control, refer to the Resources section. This quality improvement process, championed by Dr. Demming, propelled "made in Japan" from inferior quality to the excellence that it represents today.

There are two pitfalls when gathering data:

To get an accurate sample of an event that takes, for example, 20 seconds, samples must be measured every 10 seconds according to the Nyquist theorem. If you want to measure a database outage to a one minute resolution, that means you have to sample every 30 seconds. In the example script provided, the sampling rate is every two seconds, which provides rapid feedback, but generates quite a bit of sample data in a short period of time.

What about the impact on the result from taking a sample? Since I have calculated the exact momentum of my car keys, the Heisenberg Uncertainty Principle indicates they could be anywhere in the universe. Heisenberg was actually talking about elementary particles' position and momentum, but you get the idea. When a database server slows down, the added workload from people logging in to test and troubleshoot can easily make matters worse. The instrumentation in this example offloads the status Web page display to a different server. This architecture keeps the "spectators" away from the database server so they can see the status without adding to an existing problem.

The Quality Control Sampling Agent

The Quality Control Sampling Agent has the simple task of measuring the status of the database and recording this data in a database table. Almost any data about the database server can be obtained from the very rich capabilities of DB2® table functions or the Informix® sysmaster database.

The deployment architecture of the agent is shown in Figure 2. There is both a local and remote agent option. The sample code implements a single agent. To deploy two agents the schema and code require modification.


Figure 2. Sampling agent deployment architecture
Deployment architecture

Why use a local agent and a remote agent that communicates over the network? Since you are trying to do fault isolation, to fix the problem and not the blame, any differences in the local and remote agent response times immediately identifies network slowdowns. Imagine how much the network team will appreciate getting a call from a DBA helping them identify a problem.

There are two implementations of the agent in the download. The Informix directory contains a native esql/c agent. The ODBC directory contains a DB2 implementation written in PHP for maximum portability.

The program executes the following steps:

  1. Start a timer.
  2. Connect to the database.
  3. Measure the time to connect to the database.
  4. Create a new row in the transtimes table with timestamp, unique key, and connect time.
  5. Gather any statistics about the database that are interesting, and simulate a simple transaction.
  6. Update the row created in step 4, and record the statistics gathered in step 5 and the time to complete step 5.

The graph in Figure 1 shows the number of connections, connect time, and insert-update time. The actual schema of the table (Listing 1) used in the Informix port of this agent includes how much virtual memory is allocated and used. This is an illustration of how Informix sysmaster or DB2 table functions can provide an easy interface to instrumenting the server. You don't have to sed, awk, grep, and perl your way through reams of onstat or snapshot data.

Step 5 above includes simulating a simple transaction representative of the typical workload on the system. The simple transaction makes the data sample representative of what a typical person using the system would experience. For an Online Transaction Processing System (OLTP) taking phone orders, a simulated interaction could be a customer lookup. If the database system you are monitoring is for decision support (DSS), and the queries are long and complicated, don't have the quality control (QC) agent run a large query. See the Heisenberg Uncertainty Principle discussion above. A large DSS query by the QC agent will just waste cycles and slow everyone else down.


Listing 1. Schema used for the Informix esqlc version of the QC agent
                
--  (c)2006 copyright Martin Lurie, sample code, not supported
create database oltpqc;
database oltpqc;
drop table transtimes;

-- the DB2 identity datatype is similar to Informix serial
-- in the php version of the code a simple integer was used and
-- php increments the transkey, this gives maximum portability
create table transtimes ( transkey serial primary key,
			timestamp	datetime year to second,
			connect_time	float,
			session_count	integer,
			vblkused		integer,
			vblkfree		integer ,
			query_insert_time	float
);
      

The DB2 schema used in conjunction with a PHP agent is show in Listing 2.


Listing 2. Schema used for the DB2 PHP version of the QC agent
                
--  (c)2006 copyright Martin Lurie, sample code, not supported
create database oltpqc;
connect to  oltpqc;
drop table transtimes;

create table transtimes ( transkey int primary key,
			timestamp	date,
			connect_time	float,
			session_count	integer,
			query_insert_time	float
);
      

You are not in any way limited to this schema for gathering data. Whatever you choose to instrument with the agent can be included in the data sample table. This project originated due to a "blame game" between the application developers and the database administrators. Instrumenting the number of users and the total memory consumed, showed a linear relationship for memory consumption. This ended the debate and the application was changed so as not to keep spawning new connections to the database when a slow response time was detected.

The table function to find the number of users connected to DB2 is shown in Listing 3. See the Resources section for how to use other DB2 table function capabilities, and in the case of Informix, how to use the sysmaster database.


Listing 3. Query from DB2 table function to obtain concurrent user count
                
--  (c)2006 copyright Martin Lurie, sample code, not supported

 select local_cons +rem_cons_in from table (snapshot_dbm (-1))as snapshot_dbm

      

Informix uses the sysmaster database instead of table functions to provide information about the server status. See Listing 4 for the Informix query to get a user count.


Listing 4. Query from Informix sysmaster to obtain concurrent user count
                
/* this sql is from the esqlc program, so the result set is
stored in a host variable.   */
        select count(*) 
           into :sessioncount
           from sysmaster:syssessions;
      

The agent code is straight forward. The "secret sauce" to get the ODBC database connection for PHP platform independent version going is the odbc.ini file. The documentation for getting a PHP ODBC connection is widely available, refer to the Resources section. Listing 5 shows the odbc.ini for DB2 and Listing 6 shows the Informix listing. If you want to invest the time, you can build PHP with direct drivers instead of ODBC. The ODBC performance is quite adequate for this example, and will be used both as a client, and to gather the data for graphing the response time results. connection


Listing 5. DB2 odbc.ini file for ODBC configuration on Linux
                
[ODBC Data Sources]
SAMPLE=IBM DB2 ODBC DRIVER

[SAMPLE]
Driver=/home/db2inst1/sqllib/lib32/libdb2.so
Description=DB2 Sample database

      


Listing 6. Informix odbc.ini file for ODBC configuration on Linux
                
[ODBC Data Sources]
SAMPLE=INFORMIX DB2 ODBC DRIVER

Driver=/home/db2inst1/sqllib/lib32/libdb2.so
Description=Informix smaple odbc.ini


[ODBC Data Sources]
Infdrv1=IBM INFORMIX ODBC DRIVER
Infdrv2=IBM INFORMIX ODBC DRIVER

;
; Define ODBC Database Driver's Below - Driver Configuration Section
;
[Infdrv1]
Driver=/home/informix/lib/cli/iclit09b.so
Description=IBM INFORMIX ODBC DRIVER
Database=stores_demo
LogonID=odbc
pwd=odbc
Servername=ids_server1

[Infdrv2]
Driver=/home/informix/lib/cli/iclit09b.so
Description=IBM INFORMIX ODBC DRIVER
Database=oltpqc
LogonID=informix
pwd=useYourOwnPassword
Servername=k_ids
Trace=0
TraceFile=/tmp/odbctrace.out
InstallDir=/home/informix
      

Early warning system: Generating alerts

When response time is slow, it is a good idea for the DBA to know prior to getting a call from the operations center. This is a matter of taking responsibility and pride in knowing when problems arise. You want to know early that there is a problem. This isn't about the blame game, this is about delivering high quality database services.

The oltpqc code expects a positional parameter on the command line for the alarm threshold. This value is in milli-seconds. Responses slower than this time will trigger a Unix/Linux script. The implementation of this alert is shown in Listing 7 for esqlc and Listing 8 for PHP.


Listing 7. ESQLC code to generate an alarm when response time exceeds the threshold
                
/* code to capture the positional paramter from the command line */

 if (argc == 2 ) {
        thresholdms = atoi ( argv[1] ); 

#ifdef PRTDIAG
        printf( "argv[1] %s  thresholdms %d \n", argv[1], thresholdms );
#endif
        printf( "(c)copyright 2005 - alarm %d milli-seconds \n", thresholdms );
        }
        else
        {
        printf( 
		"(c)copyright 2005 - USAGE: %s threshold_millisec\n", argv[0] );
        printf( "example:  %s 1500\n" , argv[0] );
        printf( "generates an alarm when response time is over 1.5 seconds\n" );
        return -1 ;
        }


/* ... lines deleted .... */

/* logic to run a shell script and pass in the response time data 

to the shell script */
	 asprintf ( &alarmstring , "./slowalarm.sh %f %f " , 
                elapsed_con, elapsed_wrk );


 	if ( ( elapsed_wrk+elapsed_con ) > (float)thresholdms/(float)1000 ) 
                system ( alarmstring); 
#ifdef PRTDIAG
        printf ( "times elapsed_wrk %f , elapsed_con %f , thres  %f \n", 
        elapsed_wrk,elapsed_con, ( (float)thresholdms/(float)1000 ) );

#endif
      


Listing 7. PHP code to generate an alarm when response time exceeds the threshold
                

//check to see if alarm should be sounded
if (($third_time - $first_time)*1000 > $thresholdms)
{
        echo "DATABASE SLOW ALERT - CONNECT AND QUERY TIMES" ;
        echo "GREATER THAN $thresholdms MILISECONDS\n\n";
        system ("alert.sh");
}
      

Publish the the sampling results, the limitations of spreadsheets

This project cries out for publishing the data on a Web page. The sample data is not worth much without reporting and distribution. Spreadsheet technology can only take you so far for real time graphing. Emailing a spreadsheet around with the status is quite tedious. If the spreadsheet is configured to do a dynamic refresh and re-connect to the database, then every refresh from every desktop monitoring the database costs database cycles, which is an expensive proposition. Figure 3 shows a spreadsheet analysis of users and response times, but this is here for completeness and is not recommended.


Figure 3. Use a spreasheet to graph the quality samples - note the Y-axis log scale
Spreadsheet analysis

Publish with a Web site

An excellent technology to publish the status information gathered by the quality control agent is a Web site. The graph shown in Figure 4 is based on the JpGraph package (For more information, refer to the Resources section.). This PHP code is a really great set of graphing tools. You'll need to install some prerequisites including gd and php-gd. Graphing Informix against DB2 data was as simple as changing the connect statement. This is the really elegant aspect of using PHP and an ODBC connection to the database to achieve portability.


Figure 4. Sample database status using JpGraph, a PHP graphing package
Database status graph

Notice the JpGraph package allows you to put a jpg image behind the graph. Figure 4 shows the Informix sponsored race car in the background.

The status graph implements an automatic refresh. A page with real-time status is rather lame if it doesn't refresh. This doesn't take anything fancy like AJAX but is simple HTML. Look at Listings 9 and 10, can you tell why one is "bad" and the other is "good"? Listing 9 assumes a cron job is running periodically that performs the following: php qcgraph.php > foo.jpg


Listing 9. The "good" HTML page for auto refresh
                

<META >-EQUIV=Refresh CONTENT="3; URL=index.html">
<h1>Database Real Time Status</h1>
Click the refresh button if the graph does not appear below.
<hr>
<img src="foo.jpg">
<hr>
<!-- this page assumes a cron job is running periodically that 
performs the following: php qcgraph.php > foo.jpg -->
      


Listing 10. The "bad" listing of an auto refresh page, it works, but don't do it this way
                

<META HTTP-EQUIV=Refresh CONTENT="10; URL=auto.html">
<h1>Database Real Time Status</h1>
<img src="qcgraph.php">
<hr>
      

You've probably figured out by now that the "bad" auto refresh code would create a significant workload on the server if lots of people want to see the status. It is the Heisenberg Uncertainty Principle working with Murphy's law. Measuring the status of the database creates more work on the database. When something "goes wrong" the problem is compounded when lots of inquisitive people fire up their browsers to look at the server.

The "good" implementation, Listing 9, minimizes the workload on the database. The status graph jpg image is shared by all the voyeurs, and the workload to support the status seekers can be isolated at a Web server. The Web server doesn't even have to be resident on the same machine as the database. A periodic refresh of the JpGraph .jpg output is required. This could be scheduled in cron, in the sample code provided the Linux watch command is used.

Server performance characterization: Load testing with Apache JMeter

Now put all the pieces together in a single script. To get an interesting graph, and validate that the graph is reporting what you think, you'll need to generate some load on the database. Apache's JMeter is a good option, and of course IBM's Rational Performance Tester offers some advanced capabilities. All the graphs in this article were generated with JMeter as the load generator, also know as the "test harness."

Create a test harness and load test your database. Even if you don't implement any of the performance monitoring, or graphing, do load testing. If the only thing you take from the article is doing load testing prior to putting an application in production, it will have been well worth writing it.

Load testing to the point of breaking the server(s) lets you know the performance envelope of you systems. The number of users, complexity of queries or transactions, and other aspects of your application should be characterized. Armed with performance envelope graphs, combined with tracking your server workload growth trends, you can win the battle for upgrading a server before there is a major crisis.

The script in Listing 11 fires up all the elements of the performance monitor and the load testing environment. Using a shell script for complex starting sequences is highly recommended to minimize typos and "thinkos."

It may come as a shock to many, but there are comments in the script that explain each step...


Listing 11. Putting it all together with automated load testing.
                
# (c)copyright 2006 Martin Lurie   Sample code, not supported
echo must run as root... so we can restart the httpd and informix 
# read foo just makes the script pause till you hit enter
# todo: automate a user id check instead of this prompt
read foo
#  everyone tails the server log on the screen, right ?
xterm -exec tail -f /opt/informix/online.log &
. /opt/informix/iunset
# pick up the ids v10 env
 . /opt/informix/ifx_env 
# stop the server in case it is active
 su informix -c "onmode -yuk"
# start the server
 su informix -c oninit
# now pick up the csdk 2.8x environment - better yet - get 
# the bug fixed version of the 2.9 csdk
. /opt/informix/iunset
 . /home/informix/ifx_env 
 /etc/init.d/httpd restart
# must install php graph libraries 
cd /var/www/html/
xterm -exec watch -d -n4 "php ./qcgraph.php > foo.jpg" &
mozilla http://karmiel/index.html&
 cd /home/lurie/edrive/src/esql/oltpqc
 xterm &
 xterm -exec watch -d -n 5 ./oltpqc 1500 &
cd /home/lurie/tmp/jakarta-jmeter-2.2/bin/
./gojm &
      

JMeter in action doing JDBC requests is shown in Figure 5. This graph shows an average response time of 573 milliseconds. Some might be happy with this response time. If you look at the standard deviation of 796 milliseconds, the users may not be as happy as the average would predict. The good news is the standard deviation, the red line, is trending downward. The variability in response time is decreasing, so users see less variation when they press the Enter key. Closer examination of the average, the dotted black line, and of the workload, shows this server is in a startup transient state. As the connection pool from JMeter is established, the server provides much more consistent workload response.


Figure 5. JMeter load generation, graph results page
Database status graph

Troubleshooting example

Figure 6 shows periodic long connect times. Most DBAs assume this is a long checkpoint, but surprisingly all the checkpoints in online.log are 0 seconds. So what is the problem?


Figure 6. Periodic long response time problem
Database status graph

The issue here turned out to be unrelated to the database. Whenever a system seems to "hang" for about eight to 10 seconds and then runs fine for a while, a DNS network problem is one of the most likely suspects. The periodic long connect time was eliminated by fixing a temporary DNS problem (in this case the fix was as simple as re-connecting the laptop computer to the LAN so it could access the DNS server).

Where to next?

You've seen a number of technologies to use for doing database monitoring. The techniques included code portability using a standard API (ODBC), and graphing results for Web publication using JpGraph. The performance characterization of your environment with a test harness is critical to managing your servers instead of being managed by them.

There is always a build vs buy option. If creating the real-time monitor described in this article is to daunting, check out the IBM Tivoli® Monitor. It has many predefined agents, and a console that allows drill-down on individual issues. Figure 7 is an example of Tivoli Monitoring.


Figure 7. Tivoli Monitoring, if you'd rather buy than build
Tivoli Monitor

There are a number of enhancements that would be useful if implementing the sample code. If the network or database is down, the oltpqc client continues to wait for a response. Limiting the wait time and sending an alert on an error return code would be a very good thing to add.

When you discover issues with the server or network, there are a wealth of tools available to diagnose the problems and find a root cause. At the operating system level the sar, vmstat, and top commands are a good place to start to verify basic system hygene. Look for excessive CPU utilization and memory paging or swapping -- these are performance killers. The ping, netstat, nmap, and ethereal utilities provide lots of capabilities for understanding network issues.

If the operating system is in good shape, then the Informix onstat and DB2 snapshot utilities can be used to dig into database specific diagnostic information. These utilities are well documented and lots of examples exist on the Internet -- a simple google query provides lots of resources.

I hope you've found this useful and that it helps you do better problem isolation. There is no point wasting time blaming the database when real-time database status reporting helps locate the real issue.



Download

DescriptionNameSizeDownload method
Source code for monitoroltpqc.zip10KB HTTP

Information about download methods

More downloads


Resources

Learn

Get products and technologies

  • Download IBM product evaluation versions and get your hands on application development tools and middleware products from DB2, Informix, Lotus®, Rational®, Tivoli, and WebSphere®.

  • Build your next development project with IBM trial software, available for download directly from developerWorks.

Discuss

About the authors

Photo: Marty Lurie

Marty Lurie started his computer career generating chads while attempting to write Fortran on an IBM 1130. His day job is in WebSphere Systems Engineering at IBM, but if pressed he will admit he mostly plays with computers. His favorite program is the one he wrote to connect his Nordic Track to his laptop (the laptop lost two pounds, and lowered its cholesterol by 20%). Marty is an IBM-certified Advanced WebSphere Administrator, Certified DB2 DBA, Certified Business Intelligence Solutions Professional, Informix-certified Professional, Linux+ Certified, and has trained his dog to play basketball. You can contact Marty at lurie@us.ibm.com.

Aron Y. Lurie

Aron Lurie was midway through 10th grade while writing the article. He has had a web development business for 5 years, and ever since 4th grade he has been teaching himself new languages. He is the Webmaster for his school newspaper, and has been Webmaster for his USY chapter and the Hebrew College. In his spare time, Aron is a member of the Newton South ski racing team. Unfortunately, he has not yet found a way to combine computers with skiing.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, Open source
ArticleID=184764
ArticleTitle=DB2/Informix and open source: Database defense against the dark political arts
publish-date=12212006
author1-email=lurie@us.ibm.com
author1-email-cc=
author2-email=aron.lurie@gmail.com
author2-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers