Fixes are available
6.1.0.1: WebSphere Application Server V6.1.0 Fix Pack 1 for HP-UX
6.1.0.1: WebSphere Application Server V6.1.0 Fix Pack 1 for AIX
PK20304; 6.0.2.9: NO_IMPLEMENT on first request to a cluster
6.1.0.1: WebSphere Application Server V6.1.0 Fix Pack 1 for Windows
6.1.0.1: WebSphere Application Server V6.1.0 Fix Pack 1 for Linux
6.1.0.1: WebSphere Application Server V6.1.0 Fix Pack 1 for i5/OS
6.1.0.1: WebSphere Application Server V6.1.0 Fix Pack 1 for Solaris
APAR status
Closed as program error.
Error description
WLM cluster data would not be gathered on a particular cluster until the moment the first request for that cluster came in. This cause a NO_IMPLEMENT to be seen on the client once (sometimes 2-3 times) until the cluster data was created and propagated. The NO_IMPLEMENT would not be seen after the data was propagated.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: WebSphere Application Server users of * * version 6.0.2 and 6.1 using Workload * * Management (WLM) and concerned about client * * timeout lengths or hitting an infinite loop * * in WLM selection. * **************************************************************** * PROBLEM DESCRIPTION: In order to make the version 6 code * * function more like version 5 with quick * * timeouts for clients when all the EJB * * servers are down, we had to create two * * custom properties for the callback * * timeout and to enable the preload * * function. * **************************************************************** * RECOMMENDATION: * **************************************************************** Callback timeout was hardcoded to 3 minutes which could cause clients to timeout waiting on requests when all servers are down. This is a regression of behavior from versio n5 in which the clients would get an immeadiate response that there were no members available. The reason for this is that with the version 6 code, which allowed for asynchronous updates and selection, there was no manner for the WLM code to differentiate between a No Cluster Data Available exception because we are just starting up and we haven't yet initialized the data for that cluster, or there not being any data because all the servers are down. To combat the scenario in which we are just starting up, the callback timeout was added to allow a request to wait for that cluster data to be fluffed up and then the request sent through. However, this left the problem described above in that the wait would also occur when all the servers are down. There was an additional possibility that an infinite loop in selection could occur if the retry limit is reached.
Problem conclusion
After creating a custom property to let the customer determine the length of the callback timeout (including the ability to skip it altogether) in order to get around the NO_IMPLEMENT: No Cluster Data Available exception on first touch of a cluster, we had to backport in the Pre-fetch logic from the next release (which is called PreLoad in v6 to differentiate that the code is not exactly the same) which when enabled will cause a node agent to preLoad all of the cluster data without waiting for the first request to come in to do so. The combination of this custom property to enable the PreLoad logic and the custom property to set the amount of the callback timeout fixed the issue with clients timing out on requests that they wouldn't have in version 5 and still succeeding on first request of a cluster. Note that the preload property will only affect instances of a NO_IMPLEMENT: No Cluster Data Available exception, it will not affect scenarios in which other NO_IMPLEMENT exceptions are being seen (such examples would include No Available Target, Retry Limit Reached and Forward Limit Reached) An additional fix was added to solve a possible infinite loop in the selection logic if the retry limit was reached. A minor code of 40 instead of 42 is now thrown if we ever reach a scenario in which we run out of retry attempts. The WLM code will retry at a high level if the minor code is 42 on a NO_IMPLEMENT, but will not with a minor code of 40. In order to enable and use either of the custom properties, you must take the following steps: In the administrative console click on "System Administration" on the left side, click on "Cell" underneath that, then click on "Custom Properties" in the middle frame. You'll want to create two properties: IBM_CLUSTER_CALLBACK_TIMEOUT with a value for the timeout in milliseconds. 10000 is 10 seconds for example, hit apply, then back out to the custom property screen and go in again to create a new property (otherwise you'll overwrite the old one) IBM_CLUSTER_ENABLE_PRELOAD with a value of true to enable the preload function. This value must be true if the callback timeout is set to zero, and is not recommended to be set to true for customers with extremely large topologies, as the node agent can take a much longer time to start up. If you have this set to true and see long node agent start times, it is recommended to set it to false to determine if the issue is in the preloading of the cluster data. Save those changes and synch the config with the nodes, then shut everything down (dmgr NA and servers) and start it up again. If you have WLM trace enabled, you should see this in the trace logs: [3/23/06 15:17:59:828 CST] 0000000a ProcessRuntim > loadCustomProperties Entry [3/23/06 15:17:59:828 CST] 0000000a ProcessRuntim 1 Loaded custom property IBM_CLUSTER_ENABLE_PRELOAD false/true - the value set in the console [3/23/06 15:17:59:828 CST] 0000000a ProcessRuntim 1 Loaded custom property IBM_CLUSTER_CALLBACK_TIMEOUT ##### - whatever value was set in the console [3/23/06 15:17:59:828 CST] 0000000a ProcessRuntim > loadCustomProperties Exit If this is seen, then the cell has loaded the custom properties correctly and they should be used at runtime where applicable. The fix for this APAR is currently targeted for inclusion in fixpacks 6.0.2.11 and 6.1.0.1 Please refer to the recommended updates page for deliver information: http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980
Temporary fix
Comments
APAR Information
APAR number
PK20304
Reported component name
WEBSPH APP SERV
Reported component ID
5724J0800
Reported release
60A
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2006-02-22
Closed date
2006-04-26
Last modified date
2009-02-24
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Modules/Macros
WLM
Fix information
Fixed component name
WEBSPH APP SERV
Fixed component ID
5724J0800
Applicable component levels
R60A PSY
UP
R60H PSY
UP
R60I PSY
UP
R60P PSY
UP
R60S PSY
UP
R60W PSY
UP
R60Z PSY
UP
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"6.0","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
18 October 2021