IBM Support

Most Common WebSphere Application Server Transaction Manager issues

Technical Blog Post


Abstract

Most Common WebSphere Application Server Transaction Manager issues

Body

 

 

 

Transaction in WebSphere Application Server

A transaction groups pieces of work that should be atomic (all or nothing). If any piece of the transactional work fails, the entire transaction will roll back. The application server uses a transaction manager that takes responsibility of managing transactions across multiple Resource Managers(RM). Resource managers include relational databases (JDBC™ data source), messaging systems (JMS), Enterprise Information Systems (EIS).
Resource Manager supports Local and Global transactions.

 

Transaction Logs
The WebSphere Application Server (WAS) transaction service writes information to a transaction log for every global transaction that involves two or more resources, or that is distributed across multiple servers. The transaction service maintains transaction logs to ensure the integrity of transactions.

  • The tranlog subdirectory contains all of the files that hold record details of transactions that are managed by WebSphere Application Server, in particular, the current transition state.
  • The partnerlog subdirectory contains files that hold information on resources that are involved in a transaction. The partnerlog subdirectory is important in a recovery scenario to allow WebSphere Application Server to re-create a resource for recovery after the server is recycled.

 

 

 

In general, we see many issues related to transaction timeout, transaction recovery, permission, and connectivity issues. In this blog we will be discussing the most common trouble shooting issues in WebSphere Application Server.

 

1. How do I resolve WTRN0006W Transaction Timeout?
"WTRN0006W: Transaction “xxxxxxxxx…(XID)” has timed out after xxx seconds"

WTRN0006W occurs when an application can’t complete transactional work within the timeout specified in the transaction service(Total transaction lifetime timeout). The Default Transaction timeout value is 120secs.
Symptoms that can cause WTRN0006W

  • Hung thread messages in SystemOut logs, logged with message WSVR0605W / WSVR0606W
  • Long running database queries
  • Long running garbage collection activity
  • Resource contention on application server / back end resources

In most cases, WTRN0006W is only the symptom, not the actual cause. Look for OOM, Hung threads, make sure the backend resources are responding as expected, find any long running query. For more information, refer to this blog entry "Demystifying WTRN0006W" and this Webcast replay "Transactions in WebSphere Application Server - Overview and Problem Determination."

 

 

2. How do I resolve WTRN0005W Transaction recovery message?
"WTRN0005W: The XAResource for a transaction participant could not be recreated and transaction recovery may not be able to complete properly. The resource was <xxxxxxxxxxxx>"

If the backend resource is removed or changed then the recovery is not possible for that resource. You have only 2 options, either remove the tran/partner log completely or find the resource and remove the transaction for that particular resource. In most cases removing logs are not recommended, refer to this blog entry "Things to know before deleting temporary, cache and log files in WebSphere Application Server" for more information. To remove the partner from the partner log, do the following:

  1. Enable Transaction=all trace on failing server to find out the failing resource
  2. From trace search for “recovered=false”, find the recoveryID number just before the recovered=false
    WTRN0151I: Preparing to call xa recover on XAResource: SIBus:TestBus:XXXhostNode01.server1-TestBus XARecoveryDat <  auditXaRecover Exit XARecoveryDat >  getXARminst
    Entry <null> index=1, recoveryID=2,
    recovered=false xxxxxxxxx>
  3. Then add "REMOVE_PARTNER_LOG_ENTRY" custom property under servers -> Failing server name -> Transactions ->  Configure following custom property and enter the recoverID values discovered from trace (recoveryID=xx, recovered=false)
    Refer to APAR "PM62977: ALLOW FOR CLEANUP OF XA PARTNER LOGS"
  4. Start Application Server in recovery mode (in non HA environment)
    For example, <WebSphere_home>/<profile_name>/bin>startServer.bat/sh server1 –recovery
    After successful completion, start server in normal mode and make sure to delete REMOVE_PARTNER_LOG_ENTRY, save Configuration changes

Also refer to this blog entry "Recovering from failed transaction recovery."

 

 

3. What are the required permissions to store transaction logs in the database?
Access to the database is made under the JAAS J2C authentication data alias associated with the data source configured for the transaction logs storage. As such, the data alias needs at least SESSION privilege to connect to the database. If the same user ID owns the schema in which the transaction logs are to be stored, WAS has sufficient privilege to manipulate the tables. Otherwise, the data alias requires SELECT, INSERT, UPDATE, and DELETE privileges on the tables that comprise the transaction logs, and DROP ANY TABLE system privilege to enable use of the TRUNCATE TABLE statement. If the log tables are not present on server startup, and are to be created automatically, the data alias requires sufficient privilege to create tables and indexes in the transaction logs schema - it will also require a space quota in the default tablespace of the owner of that schema.

If you are using Oracle Database then you need to grant permission to perform HA Recovery successfully; if not, you may be getting the following exception:

WTRN0037W: The transaction service encountered an error on an xa_recover operation. The resource was com.ibm.ws.rsadapter.spi.WSRdbXaResourceImpl@1114a62. The error code was XAER_RMERR. The exception stack trace follows: javax.transaction.xa.XAException
at oracle.jdbc.xa.OracleXAResource.recover(OracleXAResource.java:726)
at com.ibm.ws.rsadapter.spi.WSRdbXaResourceImpl.recover(WSRdbXaResourceImpl.java:954)

Refer to document "Exception occurs during recovery of Oracle database transactions" to resolve the issue.

 

4. Is there a custom property to control transaction recovery retry limit?
No. There are no configurable properties that will control the retry interval or the number of retries associated with transaction recovery.

 

5. WebSphere Application Server fails to start with CWRLS0009E and ORA-00922 errors. What can I do?
Exception Stack:

2/24/17 14:50:23:172 EST] 0000006f SQLMultiScope I CWRLS0009E:Details of recovery log failure: java.sql.SQLSyntaxErrorException: ORA-00922: missing or invalid option at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:450)

Caused by: com.ibm.ws.recoverylog.spi.InternalLogException: java.sql.SQLSyntaxErrorException: ORA-00922: missing or invalid option at com.ibm.ws.recoverylog.custom.jdbc.impl.SQLMultiScopeRecoveryLog.openLog(SQLMultiScopeRecoveryLog.java:737) at com.ibm.tx.jta.impl.RecoveryManager.run(RecoveryManager.java:1895) ... 1 more Caused by: java.sql.SQLSyntaxErrorException: ORA-00922: missing or invalid option at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:450) at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:399

This is related to maximum table name length of 30 bytes/chars. In this case the recovery log is trying to create a table with name WAS_TRAN_LOGbaa-4baaaa001_qaqa01 which is 33 characters. The table names that the recovery log service uses for transactions are:

WAS_TRAN_LOG*suffix*, WAS_PARTNER_LOG*suffix*

where suffix is the value of the tablesuffix attribute configured by the user in the custom URL specified for the transaction log directory. In this case that value is 21 characters in length. Given that the WAS_PARTNER_LOG is already 15 characters this means that for Oracle the tablesuffix should be a maximum of 15 characters.

The restriction is documented in topic "Storing and restoring transaction and compensation logs for high availability."

 

6. Why is my transaction log getting corrupted, especially when I have the transaction logs in remote file system?

  • By default, the transaction logs are memory mapped files. Memory mapping doesn't add any significant performance gain when the logs are mounted on a remote file (you may see some performance gain if you are using local file system) system, but does add complexity to the code path (although we are not aware of any defects in that area) and in the latest versions memory mapping is disabled in HA environments on Windows and z/OS platforms.
  • There are a number of problems related to the use of memory mapped files by the recovery log service in conjunction with NFS.
  • Disable memory mapping, which can be done on a per JVM basis by setting the JVM system property.

Also see "Configuring transaction properties for an application server" in the product documentation.

Note: If you are using remote file system, please make sure it honors forced writes (regardless of whether it's HA or not - some people use a remote file system for non-HA too).

 

7. How do I resolve WTRN0062E "An illegal attempt to use multiple resources that have only one-phase capability has occurred within a global transaction"?
There can be only a single resource that is limited to one-phase capability enlisted in a global transaction. This error occurs if multiple resource managers that are only one-phase capable are used in a global transaction, or if multiple unshared connections from a single such resource manager are used in a global transaction.

The solution is to modify the application to use either a single resource that is limited to one-phase capability, or to use two-phase capable XAResources.

Refer to "WTRN0062E and J2CA0030E Errors While Trying To Do Last Participant Support Extensions" for more information.

 

8. How do I resolve WTRN0063E: "An illegal attempt to commit a one phase capable resource with existing two phase capable resources has occurred"?
The transaction service has refused an attempt to commit a one-phase capable resource with a transaction that already involves other two-phase capable resources. Either Last Participant Support (LPS) is not available, or if LPS is available, the application does not accept the heuristic risk that this would involve.

Ensure that one- and two-phase capable resources are not involved in the same transaction, or if LPS is available, reconfigure the application to accept the heuristic risk that this would involve.

 

Thanks to Dan Matthews and Jay Stidder, from the development team for Transaction Manager, who reviewed and provided great suggestions to this blog.

 

[{"Business Unit":{"code":"BU004","label":"Hybrid Cloud"},"Product":{"code":"","label":""},"Component":"","Platform":[{"code":"","label":""}],"Version":"","Edition":""}]

UID

ibm11080963