Troubleshooting locking problems in WebSphere Commerce with DB2

This article explains techniques for identifying and solving database locking problems in a WebSphere Commerce environment running DB2®.

Daniel Owusu-Afari (danieloa@ca.ibm.com), Software Engineer, IBM

Photo: Daniel Owusu-AfariDaniel Owusu-Afari is a Software Engineer at the IBM Toronto Lab, Canada. He is part of the WebSphere Commerce Advanced Technical Services team, which specializes in performance and stability.



Sammy Chow (sckchow@ca.ibm.com), Software Engineer, IBM

Author photo: Sammy ChowSammy Chow is an Advisory Software Engineer at the IBM Toronto Lab, Canada. He is part of the WebSphere Commerce Support team and is responsible for providing all aspects of client support for the WebSphere Commerce family of products.



Andres Voldman (voldman@ca.ibm.com), Software Engineer, IBM

Author photoAndres Voldman is a Software Engineer at the IBM Toronto Lab, Canada. He is a senior member of the WebSphere Commerce Support team and a subject matter expert on performance and stability of WebSphere Commerce sites. Andres has also achieved the IBM developerWorks Contributing Author level, recognizing his valued contributions as an IBM author.


developerWorks Contributing author
        level

22 July 2008

Introduction

The database is the most critical component of a WebSphere Commerce site, and because of its dynamic nature, it requires continuous attention. The WebSphere Commerce database can experience lock contention issues as a result of poor maintenance, un-tuned SQLs, or multiple threads going after the same data. As most of the shoppers' interactions rely on the database, these database issues directly impact a user's experience on the site, resulting in error pages and slow responsiveness.

This article introduces techniques to identify and troubleshoot typical locking problems in a WebSphere Commerce running with a DB2 database.


Understanding the problem

Why is it important to prevent and fix locking problems?

Locking problems can severely impact your WebSphere Commerce site. The database is involved in nearly every transaction that occurs on your WebSphere Commerce store, therefore it is a critical part of your environment.

If your WebSphere Commerce site is live in production, locking problems may affect the shopper's experience. For example, it can affect the shopper's ability to place orders, which leads to loss of sales. Locking problems can also affect the ability to perform regular store operational tasks, such as updating store catalog information, creating promotions, or processing backend requests. These problems cause loss of revenue and project delays, or can create additional costs to your IT department.

Therefore, it is critical to monitor your WebSphere Commerce database to ensure it is performing optimally and to reduce the probability of such problems from occurring. Furthermore, if problems do occur, it is important to quickly identify the root cause to minimize potential impact to your business.

This article is intended for WebSphere Commerce administrators who maintain the health of a WebSphere Commerce system. It is for people interested in learning how to properly maintain a WebSphere Commerce DB2 database, how to identify DB2 locking problems, and how to handle locking problems.


Recognizing locking issues

The key to identifying a locking problem is recognizing the symptoms. Typical symptoms include:

  • Slow response times for shoppers.
  • System appears to be hung and cannot process any requests.
  • Sudden decrease in sales or total number of completed orders.
  • Problems running backend administrative tasks.
  • Problems running scheduler commands.

Keep in mind that your site can experience these symptoms for multiple causes. The following sections show how to identify whether the problems are the result of locking issues in the database.


Identifying locking issues in WebSphere Commerce

The WebSphere Commerce environment contains multiple components. The simplest environment consists of an HTTP server, an application server, and a database, plus the hardware and the network infrastructure that are involved. Because each of these parts work together to form the overall solution, the best way to identify a locking problem is to observe each component and watch for the common symptoms. The key is to know which parts are involved and what to look for. Let's examine each of these separately.

Symptoms on the HTTP server

The HTTP server may not tell you directly whether you have a locking problem, but it is a good place to see whether your system has a bottleneck. Typical symptoms include an increased number of Web server threads or errors indicating that the maximum number of threads has been reached.

If you are running an IBM HTTP server, check your access.log and error.log for any of these symptoms. If you do not have access to tools to monitor real-time traffic, you can use the IHS built-in monitoring (mod_status) command or the netstat command.

The netstat command finds the number of active connections and close waits. A large number of connections indicate that requests are not being served in a timely manner. Close waits (an HTTP state) indicate that users are abandoning the site before the request is served. This indicates that the connections are locked waiting for database resources.

Symptoms on the application server

Locking problems are logged by the WebSphere Commerce application to the SystemOut.log file with an error code of SQL0911 along with two reason codes. Reason code "2" indicates a deadlock occurred. Reason code "68" indicates the application timed out while waiting on a lock.

Exceptions in SystemOut.log

There are two exceptions: deadlock and time out.

Deadlock exception

The following shows a deadlock exception (SQL code 911, Reason code 2). The stack trace shows this is from a flush operation. This exception shows that a deadlock occurred when an attempt was made to persist information on to the database.

[9/14/06 15:28:03:837 EDT] 32671541 ExceptionUtil E CNTR0020E: Non-application exception 
occurred while processing method "flush" on bean 
"BeanId(WC_WCS01#Enablement-BaseComponentsData.jar#ServerJDBCHelper, null)". 
Exception data: javax.ejb.EJBException: nested exception is: 
com.ibm.ejs.container.CacheFlushFailure: ; nested exception is: 
com.ibm.websphere.cpi.CPIException: ; nested exception is: 
COM.ibm.db2.jdbc.DB2Exception: [IBM][CLI Driver][DB2/6000] SQL0911N  
The current transaction has been rolled back because of a deadlock or timeout.  
Reason code "2".  SQLSTATE=40001
at com.ibm.ejs.container.ContainerTx.flush(ContainerTx.java(Compiled Code))
at com.ibm.ejs.container.ContainerTx.flush(ContainerTx.java(Inlined Compiled Code))
at com.ibm.ejs.container.EJSContainer.flush(EJSContainer.java(Inlined Compiled Code))
at com.ibm.ejs.container.BeanO.flush(BeanO.java(Compiled Code))
at com.ibm.commerce.base.helpers.BaseJDBCHelper.flush(BaseJDBCHelper.java(Compiled Code))

Time out exception

The following extract shows a time out exception (SQL code 911, reason code 68) on a user bean while committing a transaction:

[[9/29/06 5:26:33:436 CDT] 144ec339 ExceptionUtil E CNTR0021E: 
Non-application exception occurred on bean 
"BeanId(WC_demo#Member-MemberManagementData.jar#User, MemberId=9025783 )": 
Exception data: com.ibm.websphere.cpi.CPIException: ; nested exception is: 
COM.ibm.db2.jdbc.DB2Exception: [IBM][CLI Driver][DB2/6000] SQL0911N  
The current transaction has been rolled back because of a deadlock or timeout.  
Reason code "68".  SQLSTATE=40001
at com.ibm.ejs.container.ContainerManagedBeanO.store
(ContainerManagedBeanO.java(Compiled Code))
at com.ibm.ejs.container.EntityBeanO.beforeCompletion
(EntityBeanO.java(Compiled Code))
at com.ibm.ejs.container.ContainerTx.beforeCompletion
(ContainerTx.java(Compiled Code))
at com.ibm.ejs.util.tran.SyncDriver.beforeCompletion
(SyncDriver.java(Compiled Code))
atcom.ibm.ws.Transaction.JTA.RegisteredSyncs.distributeBefore
(RegisteredSyncs.java(Compiled Code))
at com.ibm.ws.Transaction.JTA.TransactionImpl.internalPrepare
(TransactionImpl.java(Compiled Code))
at com.ibm.ws.Transaction.JTA.TransactionImpl.commit
(TransactionImpl.java(Compiled Code))
at com.ibm.ws.Transaction.JTA.TranManagerImpl.commit
(TranManagerImpl.java(Compiled Code))
at com.ibm.ws.Transaction.JTA.TranManagerSet.commit
(TranManagerSet.java(Compiled Code))
at com.ibm.ws.Transaction.JTA.UserTransactionImpl.commit
(UserTransactionImpl.java(Compiled Code))

Java core

Taking a Java™ core (also known as thread dumps) on the application can also reveal database related issues. The Java core shows threads waiting on the database. When the locking problem is serious, you usually find that every single WebContainer thread is waiting for the database. A common mistake when this scenario occurs is to increase the Web container pool. Increasing the number of threads allows for more users to run into the problem, which augments the locking.

Below is a sample output from a Java core for a single thread:

3XMTHREADINFO  "WebContainer : 31" 
               (TID:0x7078A008, sys_thread_t:0x46C1DAA8, state:R, native ID:0x85EA) prio=5
4XESTACKTRACE  at COM.ibm.db2.jdbc.app.DB2PreparedStatement.SQLExecute(Native Method)
4XESTACKTRACE  at COM.ibm.db2.jdbc.app.DB2PreparedStatement.execute2(Unknown Source)
4XESTACKTRACE  at COM.ibm.db2.jdbc.app.DB2PreparedStatement.executeQuery(Unknown Source)
4XESTACKTRACE  at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.pmiExecuteQuery
               (WSJdbcPreparedStatement.java(Compiled Code))
4XESTACKTRACE  at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.executeQuery
               (WSJdbcPreparedStatement.java(Compiled Code))
4XESTACKTRACE  at com.ibm.commerce.base.helpers.BaseJDBCHelper.executeQuery
               (BaseJDBCHelper.java(Compiled Code))
4XESTACKTRACE  at com.mycompany.catalog.CatalogHelper.getSpecialAttributes
               (CatalogHelperBean.java(Compiled Code))

Symptoms on the database

If the database is experiencing locking issues, there will be an increased number of database agents and multiple connections, which are in the Lock-Wait state.

The CPU in the database machine is also impacted, but the manner depends on variables, such as the maximum number of agents supported by your database, or the lock time-out settings. For example, if the maximum number of connections is reached and the timeout is set to -1 (infinite timeout), the database enters a hang situation and almost no CPU is consumed. On the other hand, if you are using an aggressive timeout and your database allows for a large number of agents, you might experience high CPU as a result of all the new connections and the transactions' rollbacks.

DB2 offers multiple tools to assist you with troubleshooting if you are experiencing lock contention. For example, you can use the "list applications show detail" command or the db2pd tool to find the status of all the connections. A number of connections in Lock-wait indicate that your database is experiencing issues.

The following is a sample output of the list applications command:

CONNECT Auth Id  Application Name   Appl.   Application Id         Seq# Number of
---------------------------------------------------------------------------------
DB2INST1          db2bp.exe         403    *LOCAL.DB2.080618160057 0001 1          

Coordinating DB  Coordinator  Status Status Change Time  Node     
-------------------------------------------------------------
0           5356  Lock-wait   06/18/2008 12:01:02.504016 SEGOMO

DB Name    DB Path
------------------
MALL6DEV   C:\DB2\NODE0000\SQL00007\

The output of the db2pd command looks similar to this:

db2pd -db mall6dev -applications                                            
                                                                            
Database Partition 0 -- Database MALL6DEV -- Active -- Up 0 days 00:36:34 --
Date 06/18/2008 12:37:19                                                    
                                                                            
Applications:                                                               
Address    AppHandl [nod-index] NumAgents  CoorTid    Status                
                                                                            
0x01FEFA80 403      [000-00403] 1          5356       Lock-wait             
                                                                            
C-AnchID C-StmtUID  L-AnchID L-StmtUID  Appid                               
                                                                            
57       1          0        0          *LOCAL.DB2.080618160057

Understanding the different types of locking issues

Now that you have identified a locking problem, the next step is to understand the type of locking you are experiencing. It is important to know this because it helps you select a particular approach to fixing the problem.

Lock waits

Lock waits are a normal part of the database operation. However, when the average wait time is high, the concurrency of the database is affected. Long wait times result in functional problems, such as time outs and deadlocks. This impacts the shopper's experience and the availability of the site.

Lock time-outs

Lock time outs occur when a connection in lock-wait exceeds the LOCKTIMEOUT value specified in the database configuration. If the LOCKTIMEOUT parameter has not been set, it uses the default value of -1, which means an application waits indefinitely. On OLTP systems such as WebSphere Commerce, the recommended value is between 15 seconds to 1 minute (batch processes might require a longer time).

Deadlocks

A deadlock is created when one application is waiting for another application to release a lock on data and the second application is also waiting for a lock to be released by the first application. Mutual waiting for the other application to release a lock leads to a deadlock. The applications can wait forever until one application releases the lock on the held data.

DB2 implements a deadlock monitor to detect this type of situations. The frequency with which the monitor scans the connections is configured by setting the DLCHKTIME database configuration parameter.


Monitoring for locking issues

In many cases, it is important to collect more evidence about the problem to successfully determine the root cause and to troubleshoot. The best way is by capturing database snapshots and enabling monitors on your DB2 database. In this section, we introduce tools that you can use to monitor lock waits, deadlocks, and timeouts.

Database snapshot monitoring

Now that you know that multiple connections are in lock-wait state, there are different DB2 tools that give you more visibility into the problem. It is a good idea to begin your investigation by taking a database snapshots and reviewing the snapshot information to determine the kind of problematic locking situation you may be having.


Use the following command to take a snapshot for the database:db2 "get snapshot for database on mydb" > snapshot1.txt

Repeat the command over a time period at specific intervals to get an idea of how fast lock issues are occurring on the database.

The following is a sample output for a database snapshot. The counters, such as number of deadlocks or time outs, are incremented since the last time the monitors were reset. If the "Last reset timestamp" is empty, then the numbers are accumulated since the database was activated (first database connect timestamp).

              Database Snapshot

Database name                              = RMALL
Database path                              = /home/db2inst1/db2inst1/NODE0000/SQL00001/
Input database alias                       = RMALL
Database status                            = Active
Catalog database partition number          = 0
Catalog network node name                  =
Operating system running at database server= LINUX
Location of the database                   = Local
First database connect timestamp      = 02/24/2008 15:29:13.007776
Last reset timestamp                  = 02/24/2008 12:03:24.330468
Last backup timestamp                 =
Snapshot timestamp                    = 02/24/2008 16:03:50.898286

Number of automatic storage paths          = 0

High water mark for connections            = 2
Application connects                       = 1
Secondary connects total                   = 0
Applications connected currently           = 1
Appls. executing in db manager currently   = 0
Agents associated with applications        = 1
Maximum agents associated with applications= 2
Maximum coordinating agents                = 2

Locks held currently                       = 5
Lock waits                                 = 2
Time database waited on locks (ms)         = Not Collected
Lock list memory in use (Bytes)            = 2160
Deadlocks detected                    = 1
Lock escalations                      = 0
Exclusive lock escalations            = 0
Agents currently waiting on locks     = 0
Lock Timeouts                         = 0
Number of indoubt transactions             = 0

Total Private Sort heap allocated          = 0
Total Shared Sort heap allocated           = 0
Shared Sort heap high water mark           = 0
Total sorts                                = 0
Total sort time (ms)                       = Not Collected
Sort overflows                             = 0
Active sorts                               = 0

Based on the output highlighted above, you can decide to get more information for a particular locking issue.

Lock waits monitoring

If you know from the "list applications" command or the db2pd tool that there are multiple connections in lock-wait, you can use snapshots to find out which SQL the connections are executing and which locks they are waiting on.

Snapshots are written to files or tables. For example, you can use the SNAPSHOT_LOCKWAIT table function in a SELECT statement to list all the connections currently waiting for locks:

SELECT * FROM TABLE(SYSPROC.SNAPSHOT_LOCKWAIT('', -1)) AS LOCK_WAIT

Although this function is convenient, it requires that the lock switch is at the database manager level, which creates overhead. If instead you use file snapshots, you enable the lock monitor and disable it right after the snapshot has been collected.

Time out monitoring

If the database snapshot shows that connections are timing out, you can use db2pd or the db2diag log to find the related SQLs. To have the SQLs that time out printed in db2diag.log, you need to set the DIAGLEVEL set to 4. You can find the current DIAGLEVEL on your database by running the following command:

db2 "get dbm cfg" | grep -i diaglevel

You see output similar to the following:

Diagnostic error capture level (DIAGLEVEL) = 2

To update the DIAGLEVEL, run the following command:

db2 update dbm cfg using DIAGLEVEL 4

You can find the directory that contains the db2diag.log by running this command:

db2 get dbm cfg | grep -i diagpath

When a lock timeout occurs, the logs contain the following:

2008-02-28-11.01.51.511272-300 I6984451G729       LEVEL: Info
PID     : 27836                TID  : 3022087856  PROC : db2agent (RMALL) 0
INSTANCE: db2inst1             NODE : 000         DB   : RMALL
APPHDL  : 0-21                 APPID: *LOCAL.db2inst1.080228160021
FUNCTION: DB2 UDB, lock manager, sqlplnfd, probe:80
DATA #1 : String, 135 bytes
Request for lock "REC: (2, 636) RID 00000009" in mode "..X" timed out
Could be waiting for lock held by indoubt transaction
Statement:
DATA #2 : Hexdump, 46 bytes
0x3381C900 : 7570 6461 7465 2075 7365 7273 2073 6574    update users set
0x3381C910 : 2066 6965 6C64 313D 2766 2720 7768 6572     field1='f' wher
0x3381C920 : 6520 7573 6572 735F 6964 3D35 3032         e users_id=502

You can also use the db2diag utility to filter out the time-outs from the log. Running the following command prints the time-outs that occurred in the last 24 hours:

db2diag -gi data:="timed out" -H 1d -fmt "TIME-OUT DETECTED:\nTime : 
%{ts}\n@{data}\n\n"

Deadlock monitoring

If the database snapshot shows that you are getting deadlocks, you need to enable a deadlock monitor that captures details, such as the agents involved, the SQLs being executed, and the locks owned by each session. When the default options are used, the performance impact of deadlock monitors is minor and you can leave the monitors on for hours until the problem is recreated.

Defining and starting a deadlock event monitor

Follow these steps to enable a deadlock monitor:

  1. Create a temporary directory where the event monitor files will be created:
    mkdir /tmp/deadlock
  2. Connect to the database:
    db2 connect to myDB
  3. Create the event monitor using the following command:
    db2 "create event monitor wcdeadlck for deadlocks 
       with details write to file '/tmp/deadlock'"
  4. Enable the event monitor: db2 "set event monitor wcdeadlck STATE=1"
  5. Leave the event monitor running for a period of time so that it can collect enough information.
  6. After the problem has been recreated, disable the event monitor:
    db2 "set event monitor wcdeadlck STATE=0"
  7. Format the event monitor logs using the following command:
    db2evmon -db dbname -evm wcdeadlck > /tmp/wcdeadlck.txt

Sample output from the deadlock event monitor

The db2evmon tool formats the event monitor files in a text file, which contains the information about the deadlocks that occurred while the monitor was active.

Searching for "Deadlock ID" helps you find all the participants in the deadlock. In the "Deadlocked Connection" section in the sample below, you see the text for SQLs that was involved in the deadlock.

-----------------------------------------------------------------------
...
...
...
7) Deadlock Event ...
  Deadlock ID:   2
  Number of applications deadlocked: 2
  Deadlock detection time: 02/25/2008 09:46:40.668371
  Rolled back Appl participant no: 2
  Rolled back Appl Id: *LOCAL.db2inst1.080225144302
  Rolled back Appl seq number: : 0003

...
...

9) Deadlocked Connection ...
  Deadlock ID:   2
  Participant no.: 2
  Participant no. holding the lock: 1
  Appl Id: *LOCAL.db2inst1.080225144302
  Appl Seq number: 0003
  Appl Id of connection holding the lock: *LOCAL.db2inst1.080225144241
  Seq. no. of connection holding the lock: 0002
  Lock wait start time: 02/25/2008 09:46:34.040239
  Lock Name       : 0x02007C02090000000000000052
  Lock Attributes : 0x00000000
  Release Flags   : 0x40000000
  Lock Count      : 1
  Hold Count      : 0
  Current Mode    : none
  Deadlock detection time: 02/25/2008 09:46:40.668533
  Table of lock waited on      : USERS
  Schema of lock waited on     : DB2INST1
  Tablespace of lock waited on : USERSPACE1
  Type of lock: Row
  Mode of lock: X   - Exclusive
  Mode application requested on lock: X   - Exclusive
  Node lock occured on: 0
  Lock object name: 9
  Application Handle: 12
  Deadlocked Statement:
    Type     : Dynamic
    Operation: Execute Immediate
    Section  : 203
    Creator  : NULLID
    Package  : SQLC2E06
    Cursor   :
    Cursor was blocking: FALSE
    Text     : update users set field1='j' where users_id=502
 ...
 ...
 ...

  Table Schema                : DB2INST1
      Table Name                  : ORDERS
      Mode                        : X   - Exclusive

 ...
 ...
 ...
      

11) Deadlocked Connection ...
  Deadlock ID:   2
  Participant no.: 1
  Participant no. holding the lock: 2
  Appl Id: *LOCAL.db2inst1.080225144241
  Appl Seq number: 0002
  Appl Id of connection holding the lock: *LOCAL.db2inst1.080225144302
  Seq. no. of connection holding the lock: 0003
  Lock wait start time: 02/25/2008 09:46:39.300146
  Lock Name       : 0x02007B010A0000000000000052
  Lock Attributes : 0x00000000
  Release Flags   : 0x40000000
  Lock Count      : 1
  Hold Count      : 0
  Current Mode    : none
  Deadlock detection time: 02/25/2008 09:46:40.671349
  Table of lock waited on      : ORDERS
  Schema of lock waited on     : DB2INST1
  Tablespace of lock waited on : USERSPACE1
  Type of lock: Row
  Mode of lock: X   - Exclusive
  Mode application requested on lock: X   - Exclusive
  Node lock occured on: 0
  Lock object name: 10
  Application Handle: 11
  Deadlocked Statement:
    Type     : Dynamic
    Operation: Execute Immediate
    Section  : 203
    Creator  : NULLID
    Package  : SQLC2E06
    Cursor   :
    Cursor was blocking: FALSE
    Text     : update orders set status='J' where orders_id=10508

Resolving the problem

The best way to solve a problem is to map your specific problem to a common cause and solution. This section explores all the major causes of locking issues on a WebSphere Commerce environment and discuss briefly how to resolve each of them.

Most of the time, regardless of the scenario, you will find that it is necessary to map the SQLs to the WebSphere Commerce commands or operations that are producing them. Sometimes the query alone helps you deduce the operation. However, when the mapping is not straightforward, you need to rely on the WebSphere Commerce server's logs and different techniques to find the EJB originating the call. For more information, see Mapping an SQL statement to Java in WebSphere Commerce.

The most typical locking scenarios are:

  • Locking issues as a result of a lack of maintenance and tuning
  • Locking issues as a result of a backend process
  • Lock contention between two or more Web requests
  • Lock contention due to performance degradation in the application server layer

Locking issues due to lack of maintenance and tuning

By far, the most common cause for locking issues and database performance degradation is a poorly maintained database. It is important to keep the database statistics updated as much as possible. This helps the database optimizer pick the most efficient plan to access the data. When this is not the case, the optimizer may favor table scanning, which introduces performance and locking issues.

For more information about database maintenance, see the Technote on Maintaining a WebSphere Commerce DB2 database and the Webcast on Best practices to maintain a WebSphere Commerce Database.

Effects of table scanning in the acquisition of locks

When the database is not well maintained, you find queries that are resolved without using available indexes. These queries need to scan the totality or a large percentage of a table to find the data that meets the search criteria. As the database manager needs to lock every row before it is read, when there is table scanning, the number of locks multiplies and this increases the chances for contention between connections.

Inefficient SQLs

A single "bad" SQL can introduce enough problems to bring a site down. Possible reasons are:

  • The SQL was not tuned.
  • The SQL was not supported by indexes.
  • The optimizer is not resolving it correctly when the statistics are not available.

After you locate the SQLs involved in locking, it is important that you validate so that they are performing correctly. For example, if the number of rows read is high, it might mean that the SQL is table scanning and this might explain why it is involved in the locking.

DB2 provides multiple tools that help you validate an SQL statement. It is important to familiarize yourself with these tools to troubleshoot database problems. These tools include statement event monitors or snapshots for dynamic SQL. Starting with DB2 version 8, you have the option to either get the results in text files or in database tables. The advantage of using database tables is that it orders the SQLs by different criteria, such as execution time or rows read.

Lock escalations

Lock escalations can occur as a result of table scanning or an undersized locking configuration. If your system is experiencing lock escalations, you see messages in db2diag.log. Also, the database snapshot includes two counters "Lock escalations" and "Exclusive lock escalations", which identify whether you are running into this problem.

Lock escalation is an internal mechanism that reduces the number of locks held by an application. For instance, if an application is locking a lot of rows in a particular table, DB2 can escalate the row locks to a table lock. A shared lock escalation prevents all other connections from writing to the table, while an exclusive lock escalation completely prevents other connections from either reading or writing to it. Both types of escalations lead to serious locking scenarios.

The parameter that defines the amount of storage used to hold the list of locks for the database is called LOCKLIST. To prevent a single connection from exhausting the list, DB2 allows you to set a MAXLOCKS value, which represents the maximum percentage of the list that is held by a single application.

When the LOCKLIST is close to being full or an application reaches MAXLOCKS, the database manager forces the lock escalation, which increases the locking problems.

Note that you need to resolve all table scanning issues before you decide to increase the locklist.


Locking issues as a result of a backend process

Besides the Web traffic, there are multiple operational jobs that update the WebSphere Commerce database. Backend jobs can be involved in locking issues because, in contrast with Web requests, they are used to update data. For example, the catalog is used by all the connections. Also, backend jobs are used to process data for multiple users and orders at the same time and this increases the possibility of contention.

Looking at the SQLs involved in the deadlocks or time outs can give you clues as to whether the locking is due to contention between the store and a backend process. For example, if you find that multiple SELECT statements are locked waiting on an UPDATE on a catalog table, it is possible that this update is being executed by a backend process. Also, if the SQL queries data for multiple users or orders, then it most likely belongs to a backend process and not a Web request.

DB2 commands, such as "list applications show detail" or an application snapshot, display information for the connections, such as the hostname, process ID, authentication ID, and so on. This data is useful if you suspect the SQLs are not being executed from the WebSphere Commerce server.

While the database is experiencing locking, and especially if the problem seems to re-occur in regular intervals, check which backend processes might be running at the time. Following is a list of typical WebSphere Commerce tools that use the database:

  • Staging propagation
  • Massload (or other data loads)
  • Database cleanup utility
  • WebSphere Commerce scheduler, Web services and MQ requests

Staging propagation

The staging propagation utility updates key tables required for site operation. While stage propagation is running, users on the site experience general site degradation as the database lock contention will increase rapidly. We recommend that you run the utility at a time when there is minimum traffic on the site, or during a scheduled maintenance window. If the time-out on your database is too aggressive, the propagation might fail. For instructions on how to increase the timeout for staging propagation, see the destdb_locktimeout parameter for the utility.

Massload (or other data loads)

Massload or data loads to the WebSphere Commerce database are another source of lock contention. Again, we recommend that you run the utility at a time when there is minimum traffic on the site, or during a scheduled maintenance window. As with the staging tools, it is also possible to use a different time-out for the Massload utility. See this technote for details, How to change the lock timeout for the Massload utility.

Database cleanup utility (dbClean)

The dbclean utility deletes obsolete records from the WebSphere Commerce database. Besides the locking issues you encounter with the other tools, the execution of deletes is a special scenario.

When you delete rows from a table that has referential integrity, the database manager needs to scan all the child tables for data that references the data being deleted. If the scan on the child tables is not done using indexes, locking occurs. Whenever you are dealing with locks where a delete statement is involved, either from dbClean or other tools, it is important that you verify whether the locking is created from the referential integrity.

WebSphere Commerce scheduler, Web services and MQ requests

In contrast with the tools just presented, scheduler jobs, Web service, and MQ requests are executed from the WebSphere Commerce JVM. You can use the administrative console to find out which scheduled jobs are active or were active at the time. Custom logging and WebSphere Commerce tracing will give you insight as to when these requests are executed.


Lock contention between two or more Web requests

Another scenario is when a lock belonging to request originating from the Web store waits for a lock from another request that also originated from the Web store. In this scenario, it is important to identify the kind of data that the connections are waiting on. You need to know if both connections are supposed to access the same piece of data. For example, a query finding the details for an order locks with another query for the same order. Because you do not expect two different users to access the same order, you can deduce that the same user is executing concurrent transactions.

On the other hand, if multiple users read and update the same data (for example, inventory), some lock waits are normal. If the waits are unacceptable or if there are deadlocks, analyze the performance of the transactions to reduce the wait times.

Concurrent request from the same user

A common cause for deadlocks is when the same user has multiple open connections to the application server. A typical scenario for this is double-clicking. For example, a user may add an item to the cart by clicking the "Add to cart" button. While this request is being processed, the user clicks "Add to cart" again. When this happens, the application server processes two identical requests for the same user that changes the same data. This scenario is likely to cause a deadlock.

For more information, see the WebSphere Commerce Information Center topic, Double-click handling in WebSphere Commerce.

For this scenario, you prevent the user from initiating a new request while the server is still processing the first one. Although WebSphere Commerce includes double-click prevention logic on the server side, the best alternative is to handle this in the browser with JavaScript.

The following snippet shows how to prevent double-click for a button using JavaScript:

<html>
  <script language="javascript">
    var busy = false;
    function Handle_Submit(form) {
      if (!busy) {
        busy = true;
        form.submit();
     }
     return false;
   }
  </script>
  <form>
    <input type="button" value="button1" onClick="Handle_Submit(this.form)">
  </form>
</html>

The advantage of preventing double-clicking right in the browser is when server-side mechanisms are used, a single user still holdsseveral Web container threads. All the requests that are detected to be a double-click need to wait for the original request to be completed so that the response is transferred to the last request. Keep in mind that server-side mechanisms only work when session affinity is enabled. If the user's requests are directed to different servers, the JVM has no way of knowing that the request is duplicated.

Shared data that is updated from multiple connections

You might also find that the lock waits are on data that is accessed by multiple connections and at least one of them is trying to update.

If one connection has updated a row, it locks the data until the transaction finishes and only then, can the other connections update the same row. Lock waits on some shared resources are normal. A typical example is inventory, if one connection has updated a product, everyone else will have to wait until that transaction is committed.

If you are getting lock time outs on resources such as inventory, look at how long the transactions are taking in the WebSphere Commerce server. If the WebSphere Commerce server is experiencing performance problems, the transactions are not completed in a timely manner and this forces all the other connections to wait and have possible time-out.

If you are getting deadlocks, there are techniques to prevent them (although you will still see the lock waits).

For example, in the case of inventory, the WebSphere Commerce out-of-the-box code uses these techniques to prevent deadlocks:

  1. It first orders all the catentries by catentry_id to ensure they are processed in the same order:
    Enumeration  e = orderItemAB.findByOrderSortedByCatalogEntryId(orderId);
    SELECT * FROM ORDERITEMS T1 WHERE (T1.ORDERS_ID = ?) order by T1.CATENTRY_ID
  2. It uses "For Update with RS" to select the rows. In the RS isolation mode, when "or update" is used, DB2 acquires update (U) locks on all the rows. The U locks prevent other connections from updating the data, but not from reading:
    findByMultipleCatalogEntryAndFulfillmentCenterAndStoreForUpdate()

Lock contention due to performance degradation in the application server layer

If the WebSphere Commerce servers are experiencing performance problems, such as an increase in traffic, it is also possible that you see lock contention in the database.

The effect is typically as follows: When the WebSphere Commerce server performs slower than usual, transactions in the WebSphere Commerce server and in the database take longer to complete. Given that locks are released when the transaction is committed (this varies depending on the WebSphere Commerce version), some of the new transactions need to wait longer and this leads to time outs.

While active, transactions hold a Web container thread in the application server and a database connection from the connection pool. As the server is slow, a request that usually reuses a thread, forces the pool to create a new thread and connection. This continues until the maximum number of threads in the pool is reached.

When this happens, you find that your site, which usually only has a handful of connections active, is now using the total pool. The overhead caused by the additional connections impacts the database. The ripple effect in the lock waits that is created with the new transactions results in lock time outs and deadlocks.

For this reason, when a site is experiencing performance degradation along with lock contention, we recommend that you decrease the WebSphere Application Server Web container thread pool size and the database pool size until the root cause for the degradation or the locking is resolved.

To understand why longer application server transactions lead to more locking, you need to understand the isolation levels used by WebSphere Commerce.

Isolation levels used by WebSphere Commerce

Isolation levels play an important role by determining how locking is implemented on the databases. WebSphere Commerce currently uses different isolation levels depending on the version:

  • WebSphere Commerce 5.6.1: WebSphere Commerce 5.6.1 use Read Stability (RS). For this isolation level, all the read and write locks for rows that are part of the result set are held until the transaction is committed or rolled back. This means the locks are held until the command completes. With this isolation level, the longer the transaction, the higher the impact on concurrency.
  • WebSphere Commerce 6.0: WebSphere Commerce 6.0 uses Cursor Stability (CS) and optimistic locking. In version 6, with the help of optimistic locking, the isolation level was lowered to CS. With CS, read locks are only held on the current row being read while the statement is executing and are released afterwards. This greatly reduces the number of locks held and improves overall database performance. In CS, update locks are still held until the transaction is completed with a commit or rollback.

Conclusion

This article explained why it is important to prevent locking problems in your WebSphere Commerce environment running DB2 and also how to troubleshoot them if they do occur. It also provided a general guideline and techniques on how you can identify and resolve these issues.

Resources

Learn

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere, Information Management
ArticleID=322871
ArticleTitle=Troubleshooting locking problems in WebSphere Commerce with DB2
publish-date=07222008