IBM Support

Is Your WebSphere MQ Cluster at Risk for Corruption?

Technical Blog Post


Abstract

Is Your WebSphere MQ Cluster at Risk for Corruption?

Body

 

If you are using MQ clusters and have some queue managers running MQ software that is not at a current fixpack level, you may be at risk for cluster corruption.

Users of WebSphere MQ Clusters are potentially affected by the Hiper APAR IV25030. The issue is more likely to occur in clusters with very large numbers of queue managers or objects. However, it is strongly suggested that all customers apply the fix, or preferably upgrade to an MQ fixpack level that includes the fix.

The APAR affects MQ clusters on all distributed platforms (iSeries, all UNIX® and Windows) and fixes are available for MQ v6.0.2, v7.0.1, v7.1 and v7.5. Refer to the APAR website for more details:

IV25030:
WEBSPHERE MQ CLUSTER FAILS, POSSIBLY GENERATING FDC FILES WITH PROBE IDS RM296000 OR OTHER rrcE_REPOSITORY_ERROR
http://www.ibm.com/support/docview.wss?uid=swg1IV25030

 

Prevention is the best route in this scenario. Once the corruption has been introduced, the cleanup steps can be cumbersome (especially in very large cluster environments). If corruption occurs, merely applying the fix will not resolve the problem. A cluster refresh command on the queue managers throughout the cluster is also required.

There are various symptoms that could indicate your cluster is experiencing the problem. You have FDC files with Probe ID RM296000 or rrcE_REPOSITORY_ERROR. Your cluster may stop working or the cluster name of MQ cluster objects could show a mangled output such as this:
   CLUSTER(ÅÉãÅmÅÄÁâmÃÓäâãÅÙ@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@)

Cluster dumps taken with amqrfdm command may show:

** Object with incorrect CacheFlags 0 **

 

But those are not the only symptoms that may indicate this APAR. We have had other customers report cluster problems that were attributed to the high impact pervasive APAR. I strongly urge anyone with MQ clusters to be sure they have a fixpack with this APAR applied to their queue managers. Applying the fix prior to a corruption problem will prevent significant procedural overheads required for clean up.

 

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"","label":""},"Component":"","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"","label":""}}]

UID

ibm11080321