PK64003: High Availability Manager support for Transparent bridge failover.

Fixes are available

APAR status

Closed as new function.

Error description

HTTP404 (Unroutable Server) occurs during core group
bridge rebuild periods.

Local fix

Problem summary

****************************************************************
* USERS AFFECTED:  IBM WebSphere Application Server            *
*                  V6.0.2 and V6.1 users of the core group     *
*                  bridge                                      *
****************************************************************
* PROBLEM DESCRIPTION: Core group data is lost during core     *
*                      group bridge failover.                  *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
During core group bridge failover, some bulletin board data is
unavailable in local and remote core groups until the
remaining bridges can recover the data.  This can result in
404s for the WebSphere Proxy or WebSphere Extended Deployment's
On-Demand Router when routing to endpoints in non-local core
groups during bridge failover.

Problem conclusion

Core group bridge failover will no longer result in missing
bulletin board data when the custom property
"IBM_CS_HAM_PROTOCOL_VERSION=6.0.2.31" is set on every core
group of an access point group.

The fix for this APAR is currently targeted for inclusion in
fix packs 6.1.0.19 and 6.0.2.31.  Please refer to the
Recommended Updates page for delivery information:
http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980

Internal WebSphere Application Server components, such as work
load management (WLM), and the On Demand Routing features
of WebSphere Virtual Enterprise, depend on cross core group
state to perform their product functions. Bridges provide
the mechanism that is used to represent and manage the cross
core group state used by these internal users. Part of the
management of this cross core group state is to perform bridge
state rebuilds whenever there is a change in the number
of running bridges in a topology. The bridge state rebuild is
the means by which bridges calculate the ownership and
distribution of the cross-core group state among the running
set of bridges. During bridge state rebuilds, cross-core
group state can be moved between running bridges. This
situation might cause the data to be temporarily unavailable
until the bridge has completed the rebuild process.

The common symptoms of this problem are:

1. JNDI lookups failing.
2. WebSphere proxy server or On Demand Router generating 503
response codes immediately after a core group bridge has been
started or stopped.
3. CORBA exceptions immediately after a core group bridge has
been started or stopped.
4. The occurrence of the following
ArrayIndexOutOfBoundsException:
[7/9/08 17:12:20:749 EDT] 00000030 UserCallbacks E
HMGR0142E: An error occurred in a component called back by
the High Availability Manager. The exception is
java.lang.ArrayIndexOutOfBoundsException  at
com.ibm.ws.cluster.propagation.bulletinboard.BBDescriptionManage
r.getOrderedBytes(BBDescriptionManager.java:618)

To avoid the temporary application outage that might occur
during core group bridge failover, make sure that you are
running the latest HAM protocol version. This requires that:

- All 6.0.2 processes are running on 6.0.2.31 or later.

- All 6.1 processes are running on 6.1.0.19 or later.

- All 7.0 processes are running on 7.0.0.1 or later.

- The core group custom property IBM_CS_HAM_PROTOCOL_VERSION
has been set to 6.0.2.31 on all of your core groups.

If you are not running the latest high availability manager
protocol version, complete the following steps to activate
the latest high availability manager protocol:

1) Ensure that your installations are running at the required
service levels.

2) Determine if the high availability manager is configured to
use preferred coordinator servers. If the high availability
manager is not configured to use preferred coordinator servers,
you must manually determine which servers are currently
acting as preferred coordinator servers.

3) Shut down all core group bridges and all preferred
coordinator servers. The high availability manager will
immediately select new coordinators to replace those that you
shut down, but that scenario does not cause any problems.

4) Repeat the following actions for each core group in your
cells.

a)In the administrative console, click Servers > Core Groups
> Core group settings > CORE_GROUP_NAME > Custom properties

b)Specify IBM_CS_HAM_PROTOCOL_VERSION in the Name field
and 6.0.2.31 in the Value field.

c)Save your changes

5) Synchronize the configuration across the topology.

6) Restart all of the preferred coordinator servers. The
coordinator servers must complete the startup process before
you go on to the next step.

7) Restart all core group bridges in the topology.

The topology is now using the 6.0.2.31 protocol.

Other considerations when configuring your core group bridges:

1.All of the servers in a core group must be at a service
level that supports the 6.0.2.31 high availability manager
protocol (IBM_CS_HAM_PROTOCOL_VERSION=6.0.2.31). If a core
group contains servers that are at earlier service levels,
these servers should be put into separate core groups. These
core groups can then be bridged to the core groups that
support the new high availability manager protocol because
core group bridges can still communicate with each other
even if they are using different core group protocols.
However, the bridges for the core groups that are using the
high availability protocol will not be able to fully leverage
the transparent failover support that the high availability
manager protocol provides because these bridges have to
communicate with the bridges in the back-level core groups.
Therefore it is recommended that you upgrade the back-level
core groups to a service level that supports the new high
availability manager protocol if possible.

2.Transparent bridge failover is designed to hold state data
constant during core group bridge rebuilds along the state
data path, which is the path that consists of the state
provider, one core group bridge in each respective core group,
and a state data consumer. Failure scenarios that involve core
groups without any remaining active bridges might still result
in temporary state outages.

3. Whenever a change is made in core group bridge
configuration, including the addition of a new bridge, or the
removal of an existing bridge, you must fully shutdown, and
then restart all core group bridges in the affected access
point groups.

4. Always ensure that there is at least one running bridge in
each core group. Configuring two bridges in each core group,
allows for single failures, and periodic cycling of one
bridge at a time from each core group. If all of the core group
bridges in a core group are shutdown, core group state from
all foreign core groups is lost.

5. It is recommended that bridges be configured in their own
dedicated server process, and that these processes have their
monitoring policy set for automatic restart.

6. It is recommended that you always set the
IBM_CS_WIRE_FORMAT_VERSION core group custom property to the
highest value that is supported on you environment.

Temporary fix

Comments

APAR Information

APAR number
PK64003
Reported component name
WEBS APP SERV N
Reported component ID
5724H8800
Reported release
60A
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2008-04-07
Closed date
2008-07-25
Last modified date
2008-12-12

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name
WEBS APP SERV N
Fixed component ID
5724H8800

Applicable component levels

R60A PSY
UP
R60H PSY
UP
R60I PSY
UP
R60P PSY
UP
R60S PSY
UP
R60W PSY
UP
R60Z PSY
UP
R61A PSY
UP
R61H PSY
UP
R61I PSY
UP
R61P PSY
UP
R61S PSY
UP
R61W PSY
UP
R61Z PSY
UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"6.0","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
28 December 2021

Tips

PK64003: High Availability Manager support for Transparent bridge failover.

Fixes are available

Subscribe

APAR status

Closed as new function.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name

Fixed component ID

Applicable component levels

R60A PSY

R60H PSY

R60I PSY

R60P PSY

R60S PSY

R60W PSY

R60Z PSY

R61A PSY

R61H PSY

R61I PSY

R61P PSY

R61S PSY

R61W PSY

R61Z PSY

Document Information

Share your feedback

Need support?