Skip to main content

A technique for monitoring WebSphere Application Server workload management

Alexander Russell (alexander_russell@uk.ibm.com), IT Specialist, IBM Hursley Laboratories
Alexander Russell is an IT Specialist who started his career working for three years on MOM capacity planning and customer consultancy, integrating with development, build, test, and service teams for the complete product lifecycle. He now works for the Hursley Services & Technology team on the WebSphere Messaging and WebSphere Application Server product set for customers in the United States and Europe.
Alastair Watson (alastair_watson@uk.ibm.com), IT Architect, IBM Hursley Laboratories
Alastair Watson is an IT Architect and leads IBM Software Group's High Performance On Demand Solutions (HiPoDs) team in EMEA. His IBM career spans 34 years, including several years as a system programmer, several years in management, and several years in a number of technical architecture, project management, and strategy roles in software product development. Most recently, he has been a member of the Hursley Services & Technology department - a unique customer-focused services group within the Hursley Development Laboratory that helps early-adopter customers exploit new and advanced technologies in conjunction with IBM products.

Summary:  This article describes a technique for using standard Web server directives to log workload management activity in the WebSphere Application Server cluster, and a mechanism for summarizing the behavior of WebSphere Application Server workload management in real time.

Date:  31 Mar 2004
Level:  Intermediate
Activity:  559 views

Introduction

Under normal operation, workload management activity is usually of little or no concern in an IBM ® WebSphere® Application Server environment with limited scaling. Simple evidence of average processor utilization or clone-process CPU time is often sufficient to know that workload management is occurring.

However, in a failover situation where one or more application server nodes or clones become inaccessible (such as, with a network failure) or unavailable (an unresponsive clone), it may not be obvious how the workload management plug-in reacts to the event, whether at the time of failure when a node or clone is unavailable, at the time the node or clone is recognized as reavailable, or during the recovery period afterwards when workload management eventually selects the node or clone again. Behavior such as this is dependent upon several factors, including the version of WebSphere Application Server running, the fix level of the workload management plug-in, and the operating system controlling the environment.

It is useful, therefore, to be able to monitor the activities of workload management in a WebSphere Application Server environment. This article describes a technique you can use that provides an enhanced level of workload management information that is generally not practical to obtain, especially for heavily loaded clusters. This technique was developed by the authors to assist with determining the nature of uneven processor utilization across application server nodes in a horizontally scaled, high volume Web site environment during a number of failover tests.

(This article originated from investigations using WebSphere Application Server Version 4 and WebSphere Portal V4.2.1 on the AIX platform. However, since the workload management plug-in is an evolving technology -- and since WebSphere Portal 4.x incorporates the WebSphere Application Server V5 version of the plug-in -- the information presented here will be applicable to current and forthcoming versions of WebSphere Application Server and IBM HTTP Server with little to no modification.)


Limitations of detailed logging

To get information about when a clone is marked "down", or unavailable, you can turn on detailed logging in the IBM HTTP Server workload management plug-in (in plugin-cfg.xml, set LogLevel="Trace"). Doing so generates a large quantity of data to the native.log file. However, despite the amount of data that is generated, there is no information that details when the workload management plug-in detects that a node has recovered and its clones are available (indicating network connectivity issues have been resolved), that a clone has recovered (a deadlock has been cleared), or where requests are routed to when the clone to which the session had affinity is unavailable.

A method that can be used in conjunction with the HTTP Server workload management plug-in is to apply user load to the system under test (typically using an artificial load generator) and use the HTTP headers returned by the HTTP server to determine the behavior of clone selection and session affinity.

In a high volume system, though, the amount of data written to the log file is considerable, and since not even combining both of these sources of information provides the necessary level of detail to determine workload management behavior, detailed logging is of limited use.


A monitoring technique

Our solution exploits the CustomLog and LogFormat Apache Web server directives to record additional information from the HTTP headers and then uses a script to filter the information, thus reducing data volume and preserving the essential information for studying workload management in the application server cluster.

The key to this solution is the use of session affinity with cookies. The JSESSIONID cookie is set in the HTTP response headers by the server and included in subsequent HTTP requests by the client. The cookie contains a unique session identifier followed by an eight character string to denote the name of the application server clone to which the session has affinity:

Example 1. access_log entry showing cookie having been set by HTTP server

[01/Oct/2003:10:20:44 -0400] "GET /wps/portal HTTP/1.1" status=200 bytes=23559 
pid=86554 Cookie=- SetCookie=JSESSIONID=0000VYXW0EKQQ3WDAGP43OX1Z0I:uk742k92;

Example 2. access_log entry showing cookie being sent to HTTP client

[01/Oct/2003:10:20:54 -0400] "GET /wps/portal/.scr/Login/.r/1 HTTP/1.1"
status=200 bytes=13729 pid=113550 
Cookie=JSESSIONID=0000VYXW0EKQQ3WDAGP43OX1Z0I:uk742k92 SetCookie=- 

The eight-character string can be correlated to the Name displayed by the WebSphere Application Server administrative console by finding the corresponding XML <Server> tag and matching CloneID attribute inside the workload management plug-in XML configuration file, plugin-cfg.xml:

Example 3. Correlating plugin-cfg.xml clone identifier with the administrative console

<Config>
    ...
    <Server CloneID="ulmn4c8" ConnectTimeout="10" Name="WebSphere Portal">
        <Transport Hostname="portal0" Port="9081" Protocol="http"/>
    </Server>
    ...
</Config>

Apache LogFormat directive

The LogFormat directive is used to define a new log format nickname that includes information from the HTTP headers for Cookie and Set-Cookie as shown in Examples 1 and 2 above. The nickname cookie_filter is used to associate this log format with a CustomLog directive to identify where the log entry should be written to. LogFormat directives are specified inside the Web server configuration file httpd.conf:

Example 4. Configuring the LogFormat directive to include cookie information

LogFormat "%t \"%r\" status=%>s bytes=%b pid=%P
           Cookie=%{Cookie}i
           Set-Cookie=%{Set-Cookie}o"
           cookie_filter

If there is no Cookie or Set-Cookie value for a specific request, then a hyphen is added by the Web server as shown in Examples 1 and 2 above.

Apache CustomLog directive

The CustomLog directive is used to specify which file or process the log entries for the log format cookie_filter should be written to; in this case the httpd_cookie_filter script, which is included in the Download section and discussed below:

Example 5. Configuring the LogFormat directive to include cookie information

CustomLog |/usr/HTTPServer/bin/httpd_cookie_filter.pl cookie_filter

The pipe format must not include the perl command before the filter script name or contain other arguments, such as -w. Apache considers the argument immediately following the pipe format to be the LogFormat nickname and, therefore, will only start a command following the pipe character up to the first white space. The shebang (#!) line can be used inside the filter script for specifying options to the PERL interpreter, such as -w, or the path of the PERL interpreter itself. The filter was written in PERL to take advantage of regular expression pattern matching facilites, but the reader can use the language of choice, provided it is supported by the system. The CustomLog directive in Example 5, above, passes log entries to the httpd_cookie_filter PERL script according to the format given in the cookie_filter log format nickname.

The filter script writes the log entries to the usual access_log when the request is for a non-style-sheet / non-gif / non-jpeg file (this can be adapted by customizing the code in the PERL script). However, since a clone can fail at any time, this write can occur in the page just before a request for an embedded resource. A new session is established to fulfill these requests so, in these circumstances, the log entry is written to the access_log to ensure all entries with a Set-Cookie HTTP header are preserved. The filter script examines the time stamps in the log entries and, when it detects the predetermined time period has elapsed, summarizes the number of times a particular clone was referenced by incoming cookies (to maintain session affinity), and also the number of times a particular clone was referenced by a set cookie request (to establish session affinity).

The guidelines illustrated here were implemented on four IBM HTTP servers running against a WebSphere Application Server cluster of four servers, each with three WebSphere Portal server clones.


State independence and clone retry

The following information applies to IBM HTTP Server Version 1.x for AIX, but should be equally applicable to IBM HTTP Server Version 1.x on other UNIX-based Apache derivatives. (IBM HTTP Server Version 2 for AIX has a different implementation.)

To understand WebSphere Application Server workload management in a system using AIX and IBM HTTP Server Version 1.x, it is important to be clear about the process architecture: IBM HTTP Server is process-based, where each client connection is serviced by a separate HTTP server process. The maximum number of concurrent client connections that can be handled by a single server is set by the MaxClients directive (httpd.conf). In the system on which this solution was implemented, there were four HTTP servers, each with more than five hundred active server processes, totalling more then two thousand HTTP server processes in the Web server tier.

On IBM HTTP Server Version 1.x for AIX, each server process runs a separate instance of the workload management plug-in. All instances of the plug-in operate independently and do not share state information. For workload management, this has two major consequences:

  • Each instance of the plug-in makes its own discoveries about the availability of application server clones.
  • Each instance of the plug-in separately operates its own retry interval timer for each clone when marked "down".

The retry interval timer controls when that instance of the plug-in considers the clone may potentially be reavailable (i.e., should at least attempt a reconnect). Since there were so many state-independent instances of the plug-in in our Web server tier, the end-to-end behavior of workload management tended towards the aggregate of the decisions they made.

On IBM HTTP Server Version 2.x for AIX, each server process runs a number of connection threads which share the same state information, but this information is not shared across Web server processes. For workload management this has one major consequence:

  • Information about the availability of an application server clone is immediately available to the other connection threads in the same Web server process.


Clone selection policy

The output from the filtering script (Figure 1, below) shows how requests are distributed among the application server clones (i.e., our WebSphere Portal server clones), and assists in determining how the system is affected during failover. The clones in Example 6 are mapped inside the filter script from clone identifier in the workload management plug-in XML configuration file to clone name in the WebSphere Application Server administrative console. This is achieved in the filter script using a list (for ordering the output) and a simple hash table of name mappings:

Example 6. Configuring the LogFormat directive to include cookie information

my @clone_list = (
  '231', '232', '233',
  'main', '242', '243',
  '261', '262', '263',
  '271', '272', '273'
);
my %clone_map = (
  'uk6rnk6i' => '231', # 'Clone231',
  'uk6rsgh9' => '232', # 'Clone232',
  'ulnspdme' => '233', # 'Clone233',
  'ulvs66aa' => 'main', # 'WebSphere Portal',
  'uk6s1sb2' => '242', # 'Clone242',
  'ulnsj3b9' => '243', # 'Clone243',
  'uk73pgrj' => '261', # 'Clone261',
  'uk742k92' => '262', # 'Clone262',
  'ulnsvqjv' => '263', # 'Clone263',
  'uk74ljrj' => '271', # 'Clone271',
  'uk7535oi' => '272', # 'Clone272',
  'ulntbaiv' => '273', # 'Clone273'
);

The clone list is used to preserve order of the clones for output in the filter_log. The clone map is used to translate the information provided from Cookies sent to the Web server by the Web client (and Set-Cookies sent to the Web client by the Web server) into the names of the clones displayed in the administrative console. It is imperative that these mappings are accurate before using the script in your application server environment.


Interpreting output from the filtering script

The data in Figure 1 was recorded during a failover test performed on our three-tier configuration (consisting of IBM HTTP Server, WebSphere Application Server and WebSphere Portal, DB2®); two of the nodes (i.e. 2 x 3 = 6 of the clones) have been excluded from the table as their behavior closely resembles the clones on node 23 (i.e. clones 231, 232 and 233). The number of HTTP requests with Cookie and Set-Cookie headers are reported on a per-clone basis every ten seconds. For each row, there are totals of Cookie headers and Set-Cookie headers between the time stamp in the first column and the previous ten seconds. (No entries are placed in the filter log until the first time a Cookie or Set-Cookie HTTP header is detected, and since no Cookies are sent by the Web client before a Set-Cookie is sent from the Web server, the very first Set-Cookie HTTP header occurred at 11:57:20 minus ten seconds.)


Figure 1. Cookie requests to maintain session affinity and establish new sessions
Interpreting output from the filtering script

Most of the "normal behavior" time periods have also been excluded from the table, leaving ten-second time periods that show the workload management plug-in instances that are either:

  • requesting Web resources from clones where session affinity has already been established
  • establishing session affinity with clones for new sessions
  • for existing sessions where that clone has become unavailable (i.e., clones 241, 242 and 243 in Figure 1, above).

In no circumstances is a session that has been re-established on a surviving clone handed back to the original clone when the failed clone becomes available again. Bearing these points in mind:

  • During the time interval from 11:57:10 to 11:57:20 there were 12 cookies set (looking only at the six clones included in the table above). Those cookies were used for 207 Web resource requests (there is no caching proxy in this configuration, so all Web client requests impact on the Web server).
  • Across the six clones, the workload is approximately even; there will not be a completely even balance since all workload management plug-in instances are operating independently using round robin load balancing, meaning there is a chance that a given clone can be selected twice; once by one workload management plug-in for the first Web page request, and then immediately once more by a different workload management plug-in for the next Web page request.
  • Shortly after 12:23:30, clones 241, 242 and 243 were made unavailable by manually terminating the application server processes (including the clone JVMs). (This method is NOT recommended in a production environment; it was used here to simulate a server failure.) During the time interval from 12:23:30 to 12:23:40, some requests arrived for clones on the failed node. However, the workload management plug-in will have established new sessions for these requests on one of the surviving clones (231, 232 and 233, or one of the six clones excluded from the table).
  • Between 12:42:50 and 12:43:00, the application servers that had been manually stopped were restarted, and some of the workload management plug-ins that were starting new sessions for the Web clients detected that clones 241, 242 and 243 were available again, and started establishing new sessions with these clones.


Install and configure the monitoring filter

The monitoring filter script is included with this article for your own use. To install and configure the script:

  1. Download the httpd_cookie_filter.pl PERL script into the Web server bin directory. For example, on AIX using IBM HTTP Server, the filter script should be installed as /usr/HTTPServer/bin/httpd_cookie_filter.pl and execute-enabled:
    • chmod a+x /usr/HTTPServer/bin/httpd_cookie_filter.pl
  2. Edit the httpd.conf Web server configuration file to:
    • Include the LogFormat directive, as in Example 4.
    • Include the corresponding CustomLog directive, as in Example 5.
  3. Edit the httpd_cookie_filter.pl PERL script to use the clones defined in your environment. Do this programmatically by pairing up the clone identifier with the clone name. On AIX, for example, this information is found in the configuration file /usr/WebSphere/AppServer/config/plugin-cfg.xml (see Examples 3 and 6).
  4. Restart the Web server.

If you do not have the access rights to perform these operations or have trouble with any of these stages, contact your Web server administrator.


Conclusion

This article illustrated how the IBM HTTP Server configuration can be adapted to provide an assessment of workload management behavior and clone accessibility during failover and recovery. A filtering script was provided to demonstrate how the use of individual clones can be tracked over time; the same technique could also be used to monitor an Apache Web server in other scenarios by changing the log format in the Web server configuration file, and the regular expression in the filtering script. Our technique is one example of how the Apache Web server logs can be used to build or enhance an understanding of end-to-end Web site implementation and performance under various load conditions ranging from normal operation to high load and failover situations.


Acknowledgements

The authors wish to thank Paul Edlund (Poughkeepsie benchmarking center) for his technical assistance in reviewing this article.



Download

NameSizeDownload method
httpd_cookie_filter.pl.zip3 KBFTP|HTTP

Information about download methods


About the authors

Alexander Russell is an IT Specialist who started his career working for three years on MOM capacity planning and customer consultancy, integrating with development, build, test, and service teams for the complete product lifecycle. He now works for the Hursley Services & Technology team on the WebSphere Messaging and WebSphere Application Server product set for customers in the United States and Europe.

Alastair Watson is an IT Architect and leads IBM Software Group's High Performance On Demand Solutions (HiPoDs) team in EMEA. His IBM career spans 34 years, including several years as a system programmer, several years in management, and several years in a number of technical architecture, project management, and strategy roles in software product development. Most recently, he has been a member of the Hursley Services & Technology department - a unique customer-focused services group within the Hursley Development Laboratory that helps early-adopter customers exploit new and advanced technologies in conjunction with IBM products.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=14489
ArticleTitle=A technique for monitoring WebSphere Application Server workload management
publish-date=03312004
author1-email=alexander_russell@uk.ibm.com
author1-email-cc=
author2-email=alastair_watson@uk.ibm.com
author2-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers