Tips for using the IHS Access Log feature for WebSphere Commerce sites

Optimizing the use of IHS Access logging features for WebSphere Commerce sites

This article discusses the optimization of the IHS access logging features of the WebSphere Commerce sites. Optimizing the logging features will help to troubleshoot general problems, and analyse performance and security issues such as determining response time, response cookies, and limiting the number of requests.

Andres Voldman (voldman@ca.ibm.com), Senior Software Engineer, IBM

Andres Voldman's photoAndres Voldman is a Senior Software Engineer at the IBM Toronto Lab. He is part of the WebSphere Commerce Support team with focus on performance, stability, and quality of WebSphere Commerce implementations. Andres has also achieved the IBM developerWorks Contributing Author level, recognizing his valued contributions as an IBM author.


developerWorks Contributing author
        level

Li Liang (liliang@ca.ibm.com), Software Engineer, IBM

Li Liang's author photo Li Liang is a Software Engineer at the IBM Toronto Lab. She is part of the WebSphere Commerce Support team specializing in orders and install.



15 February 2013

Introduction

The IBM® HTTP Server (IHS) records all browser requests in the access.log file under the <IHSInstalldir>/logs directory. The information in the access log is critical for the resolution of session management issues, security problems, and performance problems. This article presents practical tips that take advantage of the powerful IHS logging capabilities and enable you to resolve issues faster.


Setting up custom access logging

IHS (and Apache®) provide powerful and flexible logging capabilities. By using the main IHS configuration file httpd.conf, you have full control of what logs are created, which condition requests are logged under, and which request and response attributes are recorded.

The standard way to define a new logger is to create a format and assign the format a nickname. The example shows the format named "common" which is defined by default in the httpd.conf under <IHSInstalldir>/conf directory:

LogFormat "%h %l %u %t \"%r\" %>s %b" common

Table 1. Comments in the "common" format
Common formatDescription
%hRemote host (IP)
%lRemote logname (from identd, if supplied). This returns adash unless IdentityCheck is set On. (not typically used with WC)
%uRemote user (not typically used with WC)
%tTime the request was received (standard english format)
%rFirst line of request (includes method, url, and protocol)
%>sStatus. For requests that got internally redirected, this is thestatus of the *original* request --- %...>s for the last.
%bSize of response in bytes, excluding HTTP headers.

For a detailed list of the format strings that can be included with the LogFormat directive, see the official Apache product documentation listed in the Resources.

The CustomLog directive can be used to specify the location of the access log, and the format used. The CustomLog directive example uses the common format: CustomLog "/httplogs/access.log" common

You can define the custom log directive at multiple levels. For example, if defined inside a virtual host, only requests for that virtual host are logged. The directive can also use an environment variable to control whether to log a particular request. For more information about the CustomLog directive, see the official Apache product documentation listed in the Resources.

When you use common format by specifying CustomLog "/httplogs/access.log" common, IHS creates a log similar to Listing 1:

Listing 1. Common format by specifying CustomLog "/httplogs/access.log" common.
9.26.190.217 - - [20/Jul/2012:15:08:21 -0400] "GET
                /webapp/wcs/stores/servlet/en/madisons/electrical HTTP/1.1" 200 334735
9.26.190.217 - - [20/Jul/2012:15:36:20 -0400] "GET
                /wcsstore/ExtendedSitesCatalogAssetStore/images/Brake_Pad1-70.png
                HTTP/1.1" 200 3322

Troubleshooting Tips

Now that you have custom logging setup, here are several tips that will help you take the most advantage of IHS access logging. These tips are presented in no particular order.


Tip #1 - Logging response times

Logging the response time for each request is useful when debugging performance or timing issues.

By adding %D to the access log format, the server saves the response time in microseconds. For example:

LogFormat "%h %l %u %t \"%r\" %>s %b %D" common


One microsecond is a millionth of a second. This metric is typically too small to be practical; most people prefer to work with milliseconds instead. ( milliseconds = microseconds / 1000 )

When using the values logged with %D, keep in mind that these numbers include a network component. The %D format string counts the time until all the response bytes are passed to the TCP layer in the IHS system. If there is a network or configuration issue, the values logged in the access log will be much higher than the actual time used by the WebSphere Commerce JVMs to execute the request and return it to the plug-in component in IHS.

In general %D logging is a good reference of the JVMs performance, if you suspect the numbers are too high compared to the expected JVM performance, you should enable additional monitoring in the JVM and network layers.

For more information on %D access log format, refer to the IHS official product documentation listed in the Resources.


Tip #2 - Logging the JVM that handled the request

During the troubleshooting process, it may help to know which JVM handles a particular request. There are multiple ways of logging the JVM processesing a request. Two of the most popular ways are to handle the %{WAS}e environment variable, or to use the JESSIONID cookie.

Using the %{WAS}e environment variable

For requests that are handled with the WebSphere plug-in, the plug-in module creates an environment variable called WAS which contains the server that handled the request. The WAS environment variable is logged in the access logs by adding the following to the LogFormat statement: %{WAS}e.

LogFormat "%h %l %u %t \"%r\" %>s %b %{WAS}e" common


When the %{WAS}e string is added, the hostname and transport port number of the JVM used to serve the request is logged. For example, "app1.torolab.ibm.com:9091".

JSESSIONID is the default name of the cookie that WebSphere uses for session management. This cookie has two components, a unique ID followed by a clone ID. The clone ID uniquely identifies a JVM in a cluster. For example: JSESSIONID=0000r2yG7KvBKWaTMUv-5IsCMC7:n1s1br/. By logging the JSESSIONID cookie in the access log using %{JSESSIONID}C, the clone ID that identifies the JVM will also be available in the log. Then, using the plugin-cfg.xml configuration file, you can find the JVM associated to the clone ID.
Listing 2 shows that the CloneID n1s1 maps to the JVM running in host app1.torolab.ibm.com on port 9109.

Listing 2. Clone ID n1s1
<Server CloneID="n1s1" ConnectTimeout="0" ExtendedHandshake="false"
LoadBalanceWeight="2" MaxConnections="-1"
Name="WC_demo_node_node1_server2" ServerIOTimeout="0"
WaitForContinue="false">
<Transport Hostname="app1.torolab.ibm.com" Port="9109" Protocol="http"/>
</Server>

Since the default clone IDs can be difficult to memorize, you can use the HttpSessionCloneId custom property to assign the JVMs a new clone ID. Do not name the clone IDs too descriptively, because it can create a security concern. For more information on using the theHTTPSessionClondId, see Resources.

Figure 1 shows the setting of the HttpSessionCloneId session management custom property to set for a new Clone ID.

Figure 1. Setting the HttpSessionCloneId custom property
Figure 1 shows the setting of the HttpSessionCloneId session management custom property to set a new Clone ID.

The advantage of using the JSESSIONID cookie to determine the JVM is that the cookie tells the JVM that you are using right in the browser (by inspecting the cookies). The disadvantage is that the cookie will be missing on the first request for each session, and the clone ID will be incorrect if the request is failed over to another JVM.


Tip #3 – Correlating requests

For performance analysis, or during general troubleshooting, it is often required to identify all the requests that belong to a particular user or session. The techniques described below include using the client IP, the JSESSIONID, or the WebSphere Commerce session management cookies.

Using the client IP

The default format includes the remote host (%h) as the first element in the access log. Using the client IP is a common method of identifying requests from a single source.
When a load balancer is used, it is possible that the IP reported is that of the load balancer. Additional configuration could be required to log the real client IP instead.

Using the JSESSIONID cookie


As mentioned earlier, the JSESSIONID cookie can be used to uniquely identify a session. "Grepping" the log with a particular JSESSIONID value is a common technique to extract requests that belong to the same session.

Using WebSphere Commerce session management cookies

WebSphere Commerce has its own set of session management cookies, such as WC_AUTHENTICATION_<userID>, WC_USERACTIVITY_<userID> or WC_PERSISTENT, which can also be used for request correlation.


The advantage of using the WebSphere Commerce Session management cookies for correlation is that they contain the actual WebSphere Commerce user Id, which allows you to identify requests from a particular non-generic user.

Listing 3 shows how to extract the user ID from the name of the WC_USERACTIVITY cookie.

Listing 3. Extracting the User ID
SetEnvIf Cookie "WC_USERACTIVITY_(\d+|-\d+)=" wcUserId=$1
SetEnvIf wcUserId ^$ wcUserId=-1002
LogFormat "%h %l %u %t \"%r\" %>s %b %{wcUserId}e" user
CustomLog "/httplogs/user_log" user

Note: Listing 3 assumes the existence of a single WC_USERACTIVITY cookie. There are scenarios (such as multiple stores) where having multiple activity cookies can be valid.


Tip #4 - Limiting the number of requests that are logged

Web servers typically handle not only dynamic requests, which are processed by the Application Server, but also static content such as images, css, js files and others. Since all requests are logged by default, with busy sites, access logs can grow quickly, resulting in large logs that are difficult to manage.

IHS provides configuration options to filter the requests that are logged. Administrators can use these options to log only requests served from the WebSphere Commerce JVMs, or to log these requests on a separate file. This can considerably reduce the log sizes and simplify analysis.

Use the following techniques to limit the requests that are logged:

Using the WAS environment variable

For requests that are forwarded to the WebSphere Application Server JVMs, the plug-in creates an environment variable called "WAS" which contains the name of the JVM that handled the request. This variable can then be used as a condition for logging. If the variable is not defined for a request (meaning it was not handled by WebSphere Application Server), the log entry is not logged.

LogFormat "%h %l %u %t \"%r\" %>s %b" custom

CustomLog logs/access.log common env=WAS

Logging requests that have a particular root

The following example shows how to log requests that start with a common context root:

SetEnvIf Request_URI "^/shop/" storereq
CustomLog logs/access.log common env=storereq

Excluding files with specific extensions

Listing 4 shows how to exclude particular extensions, such as images:

Listing 4. Excluding extensions
SetEnvIf Request_URI \.gif nolog
SetEnvIf Request_URI \.jpg nolog
CustomLog logs/access.log combined env=!nolog

Defining logs at the virtual host level

By default, logs are defined at the root level, which means that they are applicable for all the virtual hosts. If you move the CustomLog entry inside the virtual host definition, the log is only applicable to requests for that virtual host. This technique can be used to create separate logs for SSL and non-SSL, or separate logs for the tools.

Listing 5 shows how to define the cmc.log log inside the 8002 virtual host:

Listing 5. Defining the cmc.log log inside the 8002 virtual host
<VirtualHost app01.torolab.ibm.com:8002>
SSLEnable
SSLClientAuth 0
ServerName mystore.torolab.ibm.com
Alias   /adminconsole "/app/profiles/demo/
installedApps/WC_demo_cell/WC_demo.ear/SiteAdministration.war/tools/adminconsole/
    wcsadmincon.html"
Alias   /wcsstore "/app/profiles/demo/installedApps/WC_demo_cell/WC_demo.ear/Stores.war"
Alias   /wcadmin "/app/profiles/demo/installedApps/WC_demo_cell/WC_demo.ear/
    SiteAdministration.war"
CustomLog logs/cmc.log common
</VirtualHost>

If you maintain the log definition at the root level, it is useful to add the port number to the log format by adding %p. %p indicates the canonical port used in the request, and you can use it to differentiate requests from the WebSphere Commerce tools versus store requests, or http versus https.


Tip #5 - Logging when a page was served by Dynacache

For performance analysis, it is important to know if a page is returned from cache. Starting with WebSphere Application Server V7.0.0.11, the JVM adds a response header called CACHED_RESPONSE to the log if the response was cached with DynaCache at the servlet or full page cache level. For more information about DynaCache, see the documentation page in the Resources.

This header can be added to the access logs like this example:

%{CACHED_RESPONSE}o.

If the page is retrieved from cache, the log adds "yes". If the header is missing, the log adds a dash (-) instead.


Tip #6 - Logging request and response cookies

Logging the request and response cookies is useful in troubleshooting session management issues. Logging the request and response cookies make the log considerably larger. But, this option is still more lightweight than enabling tracing in the application server layer. To log the request and response cookies, add the following parameters in the httpd.conf to add more information to the cookie in the access log:

Table 2. httpd.conf parameters
ParameterDescription
{Cookie}i Adds the cookie request header to the log.
{Set-Cookie}o Adds the cookies with the response to the log.

If you are only interested in specific request cookies, use the %{COOKIE}C format instead. This approach does not work with the WebSphere Commerce cookies that include the user ID in the name, such as WC_USERACTIVITY_<number>.


Tip #7 - Creating a parseable log format

If the log format is not defined correctly, parsing, or breaking the different elements down for analysis, will be difficult. In order to create a more parsable log format, you need to consider how you will parse it through every step of creating the log format. Although tools such as grep are commonly used to locate keywords in the logs, which may help you find the section you need to analyze, importing the log into a database table or to implementing scripts will allow for deeper analysis.

For example, if you plan to load the access log into a database table, you can configure the log format to surround values in quotes and separate them with commas, so that the log can be loaded directly into the database table using standard import tools that support the CSV format. If you don't use quotes and commas, it makes it difficult to load the access log into the database. Listing 6 demonstrates the use of quotes and commas:

Listing 6. Using commas and quotes in your logs
LogFormat "\"%h\",\"%{%Y-%m-%d-%H.%M.%S.000000}t\",%p,\"%H\",\"%m\",\"%U\",
                    \"%q\",%>s,\"%b\",\"
%{WAS}e\",%D,\"%{CACHED_RESPONSE}o\"" csv

CustomLog "/httplogs/access_csv" csv

Another option for making your access logs more parsable is to implement your own custom tool. There are different libraries available on the web that can help you with the parsing of the access log file. For example, for Perl applications, the Apache::LogRegex library can parse a log entry given a LogFormat. For more information about Apache::LogRegex see the Resources. Listing 7 demonstrates using the Apache::LogRegex library to retrieve the value of %D from a log entry:

Listing 7. Retrieving %D from a log entry
my $lr = Apache::LogRegex->new($log_format);
....
# For each line
my %parsed_data = $lr->parse($line);
print "Response Time = " . $parsed_data{"%D"};

Tip #8 – Implementing log rotation

Given the speed at which access logs grow, set the log rotation so it continuously writes multiple logs with a specified log size. IHS includes the rotatelogs utility that automates rotation to be based on a time interval or size.
This example shows a new log being created whenever the size of the current log reaches 00: MB:CustomLog "|bin/rotatelogs /httplogs/access.log 500M" common

For more information on how log rotation can be configured and used, see the Apache product documetation listed in the Resources. The Apache documentation also mentions the cronolog tool as a more flexible option. For more information, see the tool's home page, listed in the Resources. Besides log rotation, you can implement more (cron) script to compress, archive, and delete old logs.


Tip #9 - Logging compression statistics

Most access log formats include %b, which is the size of response in bytes. This can be useful to identify responses that are extraordinary large.

As most sites enable compression (mod_deflate), %b shows the compressed size of the response as opposed to the original one. For more details on the mod_deflate module, see Resources.

Listing 8 shows how you can add the compressed size, uncompressed size, and compression ratio to the access logs:

Listing 8. Adding the compressed size, uncompressed size, and compression ratio to the access logs
DeflateFilterNote Input instream>
DeflateFilterNote Output outstream>
DeflateFilterNote Ratio ratio>
LogFormat "%h %l %u %t \"%r\" %>s %b %{outstream}n/%{instream}n(%{ratio}n%%)"

Conclusion

With the tips in this article you can take advantage of the powerful access logging capabilities of the IBM HTTP server. These tips provide performance analysis, security, and troubleshooting possibilities that will assist you in the managing and extracting of information from your logs.

Resources

Learn

Get products and technologies

  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

  • Get involved in the My developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Commerce on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Commerce, WebSphere
ArticleID=858124
ArticleTitle=Tips for using the IHS Access Log feature for WebSphere Commerce sites
publish-date=02152013