IBM Support

DB2 LUW: Why round-robin WLB cannot distribute connections evenly to all members?

Technical Blog Post


Abstract

DB2 LUW: Why round-robin WLB cannot distribute connections evenly to all members?

Body

This is a problem I met last year, I just get the time to write it down, hope it helps!

I have a pureScale with 8 members, say member 0,1,2,3,4,5,6,7,  and I have some applications running on a web application server will access a database on this pureScale,  the application server manages a database connection pool, and the applications share the database connections within the pool.

The design of the connection pool is setting connection count to 1000, and workload should be evenly distributed to 8 members.  I know workload balance(WLB) can satisfy this requirement.

At first, I create a round-robin WLB subset:

db2 connect to MYDB
db2 "CALL SYSPROC.WLM_CREATE_MEMBER_SUBSET('SAMPLE_SUB, '<databaseAlias>MYDB_RR</databaseAlias>', '( 0, 1, 2,3,4,5,6,7 )')"
db2 "CALL SYSPROC.WLM_ALTER_MEMBER_SUBSET( 'SAMPLE_SUB', '<memberPriorityBasis>equalPriority</memberPriorityBasis>', NULL )"

The connect string I set for the managed connection pool:

 String  url =  "jdbc:db2://db03:50000/MYDB_RR:loginTimeout=10;keepAliveTimeOut=10;enableExtendedIndicators=2;queryTimeoutInterruptProcessingMode=2;retrieveMessagesFromServerOnGetMessage=true;enableSysplexWLB=true;";

Then I start my applications, unfortunately I find the WLB doesn't work as expected.  Totally 1000 connections, I find almost all connections go to the first member, there is few connections go to the other members.  See the netstat output:

 

Firstly, I guess the connects didn't go to other members because there wasn't any workload running on those connections (given we are using connection pool, during the test, there might be only 1~3 connections have workload running).  So I plus more workload to my applications, ensure all connections in the pool will be used, but still the problem exists.

Then I am recommended to ensure the first connection is established before launching the remaining connections otherwise a good number will likely go to the cataloged first.  I follow this recommendation, but no luck.

Then occasionally I test with 50 connections, the connections are evenly distributed among the members!  Then I test with 2000 connections, I find again almost all connections go to the first member, but this time, the number of connections to the first member is still about 1000.  1000 is an interesting number in this case, right?

Finally I recall the upper limit for the number of transport objects in a global transport object pool for the connection concentrator and Sysplex workload balancing is 1000 by default,  it can be configured with JVM property: db2.jcc.maxTransportObjects:

https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.apdv.java.doc/src/tpc/imjcc_r0052075.html#imjcc_r0052075__tblcfgpr2

As the best practice, db2.jcc.maxTransportObjects should be set to  the number of concurrent connections X the number of members.

So I restart my application server with JVM parameter "-Ddb2.jcc.maxTransportObjects=8000", and the test result shows the connections are evenly distributed to all 8 member!!

Tips:

1. For version 3.63 or 4.13 or later of the IBM Data Server Driver for JDBC and SQLJ, the default  value for db2.jcc.maxTransportObjects is 1000. For earlier versions of the driver, the default is -1, means that there is no limit to the number of transport objects in the global transport object pool.

2. There is another JCC parameter maxTransportObjects that specifies the maximum number of transport objects that can be used for all connections with the associated DataSource object. maxTransportObjects does not override the db2.jcc.maxTransportObjects configuration property. maxTransportObjects has no effect on connections from other DataSource objects. If the maxTransportObjects value is larger than the db2.jcc.maxTransportObjects value, maxTransportObjects does not increase the db2.jcc.maxTransportObjects value.  See:

https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.apdv.java.doc/src/tpc/imjcc_r0052072.html

 

[{"Business Unit":{"code":"BU029","label":"Data and AI"}, "Product":{"code":"SSEPGG","label":"DB2 for Linux- UNIX and Windows"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":""}]

UID

ibm11140604