IBM Support

Maximo and Control Desk Database Connection Leak Resolution

Troubleshooting


Problem

Database connection leaks could be caused by incorrect code or memory leaks. In order to address this problem, a feature called "close long running database connection" has been added as of Maximo 7.5.0.3 and later.

Symptom

Database connections increase and never go down. Requires application server restart to release increased database connections. This is tracked as IV21165.

Cause

Maximo uses its own internal database connection pool mechanism and does not use the application server database connection pool. In Maximo and Maximo based products such as Control Desk, there can be database connection leaks. These database connection leaks could be caused by incorrect code as well as memory leaks.
In situations where there are database connection leaks, it is very difficult to identify the code where the connection leak is occurring.

Diagnosing The Problem

Turn Maximo dbconnection watchdog logger to INFO. Collect Maximo logs for 24 hour period.

If Maximo logs indicate lost connection and connection pool's used connections growing, it indicates a database connection leak.

Example,


[INFO] BMXAA7084I - The DbConnectionWatchDog class has been trying to close the database connection for: 230233001 ms


DbConnectionWatchDog:Db Connection reference id=436107 SPID=397
Create time:1302986384636
Life time:230233001 ms

In this example, the database connection watchdog logger is indicating the connection as being held for 230233001 ms i.e. approximately 64 hours. This database connection used by some Maximo transaction is not released back to the connection pool. This connection is considered a lost database connection as it is held for a long time and is not available for freeing to the free connection pool for reuse. By looking in the logs at approximately 64 hours back, one should be able to find the stack trace of where this connection was established. An example is

[INFO] BMXAA7084I - The DbConnectionWatchDog class has been trying to close the database connection for: 73000 ms
DbConnectionWatchDog:Db Connection reference id=436107 SPID=397
Create time:1302986384636
Create stack trace:
psdi.server.DbConnectionWatchDog$ConnectionData.<init>(DbConnectionWatchDog.java:40)
psdi.server.DbConnectionWatchDog.addConnection(DbConnectionWatchDog.java:206)
psdi.server.ConRef.incrementRefCount(ConRef.java:129)
psdi.server.DBManager.getConnectionDetail(DBManager.java:1268)
psdi.server.DBManager.getConnection(DBManager.java:1179)
psdi.server.AppService.getDBConnection(AppService.java:548)
psdi.mbo.MboSet.getNextRecordData(MboSet.java:2794)
psdi.mbo.MboSet.fetchMbosActual(MboSet.java:2528)
psdi.mbo.MboSet.fetchMbos(MboSet.java:2494)
psdi.mbo.MboSet.getMbo(MboSet.java:1821)
psdi.mbo.MboSet.isEmpty(MboSet.java:3777)
psdi.app.asset.Asset.getParentAsset(Asset.java:7148)
psdi.app.asset.Asset.getWarrantyInfo(Asset.java:7082)
psdi.app.asset.Asset.getWarrantyInfo(Asset.java:7067)
psdi.app.workorder.WO.updateWarrantyContracts(WO.java:9777)
psdi.app.workorder.FldWOAssetnum.performWarrantyContractCheck(FldWOAssetnum.java:380)
psdi.app.workorder.FldWOAssetnum.action(FldWOAssetnum.java:201)
psdi.plust.app.workorder.PlusTFldWOAssetnum.action(Unknown Source)

......
 

With the dbconnection watchdog logger, the log files show exceptions as above and these indicate where the first occurrence of database connection access in the code. This exception indicates the code area where the connection leak is. If these connection leaks are in core product, the information can be provided to your product support team to get a fix. If the connection leak is in a custom code area, it should be addressed by your custom code development team.

The log file also indicates total number of connections used by the application server as follows.


[INFO] BMXAA7088I - The DbConnectionWatchDog class recorded 44 connection references in this period, currently there are 44 open references, 43 long running references, 43 long running connections, and 0 references that were closed by the garbage collector finalizer.
[INFO] DBManager Total number of connections: 49 Number of free connections = 5 Number of used connections = 44

The corresponding user count for the server shows as follows:

[INFO] BMXAA6369I - Server host: A.B.C.D. Server name: MXServer. Number of users: 13. WebClientSessions: (13)

In this situation, for 13 users, there are 43 long running connection.

The following technical note explains how to estimate the number of database connections needed for a Maximo application server.

http://www-01.ibm.com/support/docview.wss?uid=swg21469906

If the long running references are higher than the number of users and constantly growing it indicates a database connection leak.

Resolving The Problem

The database connection leaks should be identified and fixed in the code. It can be accomplished by using the dbconnection watchdog logger. The log files can indicate where the connection leaks might be.

Ideally, all database connection leaks should be fixed by fixing the offending code, but sometimes it may be difficult to find the connection leak and it may be necessary to find an alternate solution to release the leaking database connection. To address this problem, from Maximo 7.5.0.3 onwards, two new properties are added and through these properties, Maximo application server database connection pool will release the lost connection automatically.

The Maximo system properties used for this feature are:

mxe.db.closelongrunconn - default is false

mxe.db.longruntimelimit - default 180 minutes.


mxe.db.detectlongrunconninterval - this is the frequency in which the long running connections are checked.
Note: If the above property is not created as part of the fixpack application, you must define the property in the system properties application. The value of this property cannot be less than 30 minutes.




To set the above properties, in Maximo, go to System Configuration --> Platform Configuration --> System Properties. Change the values as described below. From the Select Action menu, apply Live Refresh to apply the changes.

The mxe.db.closelongrunconn property when set to true will close the lost connections if the connections have been held for greater than mxe.db.longruntimelimit time. If this feature is enabled through the above property, Maximo server will check for any lost connections in a 30 minute frequency ( this is not configurable ). If the application server finds any database connection held for greater than mxe.db.longruntimelimit, then it will check in the database to make sure the connection was actually not used by any processes in that time duration. If the connection was not used by any process, then it sets the Maximo database connection reference count to zero and returns the connection back to the connection pool. With the database connection back in the connection pool, it can be used for later database connection needs or be closed if there are excess free connections.

If database connection is held for a long time but is being used by some process, then the database will indicate usage of the connection. In this case, Maximo will not return the connection back to the database connection pool and leave it as is. This makes sure database connections used by some long running process are not closed by accident.

For Maximo application server to check the database connection usage, Maximo database account should have permission to read the following database system tables.

Oracle : v$session, gv$session ( on RAC )

SQL Server : master..sysprocesses

DB2 : SYSIBMADM.SNAPAPPL_INFO

DB2 monitoring needs to be enabled to trace usage.


The specific monitor element that must be captured is the "appl_idle_time" monitor element (http://pic.dhe.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.mon.doc/doc/r0001193.html)

By default mxe.db.closelongrunconn property is set to false. Maximo will not automatically close any database connections that are lost. Note that Maximo connection pool mechanism closes unused free connections and decreases the connection pool to the MaxFreeConnection property level. This is different than closing the lost connections. Closing unused database connections is part of the connection pool management.

The default value for mxe.db.longruntimelimit is 180 minutes. This time is used to check if the database connection was actually used by any process at the database level. If the database connection was not used at the database level, then it will be returned to the Maximo database connection pool. This value can be adjusted as needed.

If there are database connections outside of Maximo application server, Maximo is unaware of it and they are outside the scope of Maximo database connection pool management.

With this feature the long running database connection leak should be resolved without having to restart the Maximo server to release the database connections.

[{"Business Unit":{"code":"BU005","label":"IoT"},"Product":{"code":"SSLKT6","label":"Maximo Asset Management"},"Component":"System Related","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.5;7.6;7.6.1","Edition":""},{"Business Unit":{"code":"BU004","label":"Hybrid Cloud"},"Product":{"code":"SSWT9A","label":"Control Desk"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.5;7.6;7.6.1","Edition":""}]

Document Information

Modified date:
08 March 2019

UID

swg21622010