After a longer time of inactivity i started with new energy and positiv spirit to solve customers problems.
Last week I worked on a case with a transaction recovery issue.
On of our customer had the problem whenever he started the server the systemout log of the Application cluster
was filled up with exception like
00000075 XARecoveryMgr E J2CA0128E: An Exception occurred while trying to start ResourceAdapter
The exception is: java.lang.ClassNotFoundException: com.ibm.j2ca.jdbc.JDBCResourceAdapter
This exception was not only related to a specific application, the same exception was also generated for a lot of other
apps that use the JDBC adapter.
Further reading the error stack I could see more details to the java.lang.ClassNotFoundException:
at java.lang.Class.forNameImpl(Native Method)
What you can see here - the RecoveryManager is called that try to recover information from the PartnerLog but that failed.
So, what could happened ?
1) the application existed formerly
2) there was a crash in the past when a transaction was active. That leads to writing records to the transaction log and PartnerLog
3) the application (e.g. myApp-DC_Database_MyApp) was deleted / removed from the server
4) the record in PartnerLog still exist and because of that, whenever the server is started you get the adapter recovering exception.
For such kind of problems a WebSphere Apar was created
PM62977: ALLOW FOR CLEANUP OF XA PARTNER LOGS (http://www-01.ibm.com/support/docview.wss?uid=swg1PM62977)
This Apar should avoid such problems in the future after it was installed but cannot solve the current problem with the PartnerLog.
So the question is how to clean up the content of the PartnerLog from the orphaned records ?
The procedure is also mentioned in that Apar as well.
From the transaction traces we can see entries like:
0000070 XARecoveryDat > recover Entry
index=6, recoveryID=7, recovered=false,
data=com.ibm.ejs.j2c.J2CXAResourceFactory, J2CXAResourceInfo : c
In summary there are more than 250 recoveryIDs for several application which failed. From that perspective it makes no sense trying to remove every single ID with a specific attribute (e.g. REMOVE_PARTNER_LOG_ENTRY=4,13,9,7, ...).
Instead of adressing every single ID we used the value '=' means that all existing recoveryIDs will be cleaned out.
" A recovery ID of 0 (zero) indicates that ALL resource entries that throw XA Exceptions during recovery are removed from the
the transaction partner log"
The steps we used were:
1) Follow the instructions of APAR description:
and set the property:
(Servers >> Application Servers >> server >> Container Services >> Transaction Service)
2) Still let the transaction traces enabled. Trace-string:
In that context, set the historical trace-files to 10 and the trace-file size to 100 MB
3) stop the server and backup / clean the old log- ffdc- and trace-files
4) restart the server in the recovery mode by using the command:
'startServer.sh <servername> -recovery'
./startServer.sh server1 -recovery
5) wait until the recovery is finished and stop the server in case it
will not be stopped automatically
6) start the server in normal mode and remove the
Bring up the server in question that is having the problem and let the other servers from the cell down until the problem is
solved. You can start the other servers of the cell as soon as you have removed the peroperty REMOVE_PARTNER_LOG_ENTRY=0 from the server in question and as soon as the server is restarted.
With that procedure we were able to clean up the PartnerLog from the orphaned records and avoid the transaction recovery exception in the sever logs during server startup.
And if this doesn't help, take two of these and call me in the morning.
Your Dr. Debug