IBM Support

PK20304: NO_IMPLEMENT ON FIRST REQUEST TO A CLUSTER.

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • WLM cluster data would not be gathered on a particular cluster
    until the moment the first request for that cluster came in.
    This cause a NO_IMPLEMENT to be seen on the client once
    (sometimes 2-3 times) until the cluster data was created and
    propagated.  The NO_IMPLEMENT would not be seen after the data
    was propagated.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  WebSphere Application Server users of       *
    *                  version 6.0.2 and 6.1 using Workload        *
    *                  Management (WLM) and concerned about client *
    *                  timeout lengths or hitting an infinite loop *
    *                  in WLM selection.                           *
    ****************************************************************
    * PROBLEM DESCRIPTION: In order to make the version 6 code     *
    *                      function more like version 5 with quick *
    *                      timeouts for clients when all the EJB   *
    *                      servers are down, we had to create two  *
    *                      custom properties for the callback      *
    *                      timeout and to enable the preload       *
    *                      function.                               *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    Callback timeout was hardcoded to 3 minutes which could
    cause clients to timeout waiting on requests when all
    servers are down.  This is a regression of behavior from
    versio n5 in which the clients would get an immeadiate
    response that there were no members available.
    
    The reason for this is that with the version 6 code, which
    allowed for asynchronous updates and selection, there was no
    manner for the WLM code to differentiate between a No Cluster
    Data Available exception because we are just starting up and
    we haven't yet initialized the data for that cluster, or there
    not being any data because all the servers are down.  To
    combat the scenario in which we are just starting up, the
    callback timeout was added to allow a request to wait for that
    cluster data to be fluffed up and then the request sent
    through.  However, this left the problem described above in
    that the wait would also occur when all the servers are down.
    
    There was an additional possibility that an infinite loop in
    selection could occur if the retry limit is reached.
    

Problem conclusion

  • After creating a custom property to let the customer determine
    the length of the callback timeout (including the ability to
    skip it altogether) in order to get around the NO_IMPLEMENT:
    No Cluster Data Available exception on first touch of a
    cluster, we had to backport in the Pre-fetch logic from
    the next release (which is called PreLoad in v6 to
    differentiate that the code is not exactly the same) which when
    enabled will cause a node agent to preLoad all of the cluster
    data without waiting for the first request to come in to do
    so.  The combination of this custom property to enable the
    PreLoad logic and the custom property to set the amount of the
    callback timeout fixed the issue with clients timing out on
    requests that they wouldn't have in version 5 and still
    succeeding on first request of a cluster.
    
    Note that the preload property will only affect instances of a
    NO_IMPLEMENT: No Cluster Data Available exception, it will not
    affect scenarios in which other NO_IMPLEMENT exceptions are
    being seen (such examples would include No Available Target,
    Retry Limit Reached and Forward Limit Reached)
    
    An additional fix was added to solve a possible infinite
    loop in the selection logic if the retry limit was reached.
    A minor code of 40 instead of 42 is now thrown if we ever
    reach a scenario in which we run out of retry attempts.
    The WLM code will retry at a high level if the minor code
    is 42 on a NO_IMPLEMENT, but will not with a minor code of 40.
    
    In order to enable and use either of the custom properties,
    you must take the following steps:
    
    In the administrative  console click on "System Administration"
    on the left side, click on "Cell" underneath that, then click
    on "Custom Properties" in the middle frame.
    
    You'll want to create two properties:
    
    IBM_CLUSTER_CALLBACK_TIMEOUT with a value for the timeout in
    milliseconds.  10000 is 10 seconds for example, hit apply,
    then back out to the custom property screen and go in again to
    create a new property (otherwise you'll overwrite the old one)
    IBM_CLUSTER_ENABLE_PRELOAD with a value of true to enable the
    preload function.  This value must be true if the callback
    timeout is set to zero, and is not recommended to be set to
    true for customers with extremely large topologies, as the
    node agent can take a much longer time to start up.  If
    you have this set to true and see long node agent start
    times, it is recommended to set it to false to determine if
    the issue is in the preloading of the cluster data.
    
    Save those changes and synch the config with the nodes, then
    shut everything down (dmgr NA and servers) and start it up
    again. If you have WLM trace enabled, you should see this in
    the trace logs:
    
    [3/23/06 15:17:59:828 CST] 0000000a ProcessRuntim >
    loadCustomProperties Entry
    [3/23/06 15:17:59:828 CST] 0000000a ProcessRuntim 1
    Loaded custom property
    IBM_CLUSTER_ENABLE_PRELOAD
    false/true - the value set in the console
    [3/23/06 15:17:59:828 CST] 0000000a ProcessRuntim 1
    Loaded custom property
    IBM_CLUSTER_CALLBACK_TIMEOUT
    ##### - whatever value was set in the console
    [3/23/06 15:17:59:828 CST] 0000000a ProcessRuntim >
    loadCustomProperties Exit
    
    If this is seen, then the cell has loaded the custom properties
    correctly and they should be used at runtime where applicable.
    
    The fix for this APAR is currently targeted for inclusion
    in fixpacks 6.0.2.11 and 6.1.0.1
    Please refer to the recommended updates page for deliver
    information:
    http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980
    

Temporary fix

Comments

APAR Information

  • APAR number

    PK20304

  • Reported component name

    WEBSPH APP SERV

  • Reported component ID

    5724J0800

  • Reported release

    60A

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2006-02-22

  • Closed date

    2006-04-26

  • Last modified date

    2009-02-24

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Modules/Macros

  • WLM
    

Fix information

  • Fixed component name

    WEBSPH APP SERV

  • Fixed component ID

    5724J0800

Applicable component levels

  • R60A PSY

       UP

  • R60H PSY

       UP

  • R60I PSY

       UP

  • R60P PSY

       UP

  • R60S PSY

       UP

  • R60W PSY

       UP

  • R60Z PSY

       UP

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"6.0","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
18 October 2021