Recovering WebSphere MQ and WebSphere Message Broker after a server crash
This article is about disaster recovery, specifically recovering the queue manager and broker after a server crash involving IBM® WebSphere® MQ V7 and IBM WebSphere Message Broker V7/V8. A server crash can consist of any abrupt restart of the server that does not allow WebSphere MQ and WebSphere Message Broker shutdown routines to complete before the server goes down. It includes server hangs where the administrator must reboot the server, and the shutdown routines are not able to complete. This article describes error scenarios following abrupt server restart, and shows system administrators how to recover from such situations.
Existing resources such as information centers provide standard WebSphere MQ and WebSphere Message Broker restore commands, but lack step-by-step troubleshooting information and additional steps that are sometimes needed to recover the environment. This article is for WebSphere MQ and WebSphere Message Broker administrators, and Integration and Infrastructure Architects who need to guide their Support Teams in recovering from a server crash. Many customers have business-critical applications that use WebSphere MQ as a messaging system and WebSphere Message Broker for transformation, enrichment, and routing, so detailed information on this topic is worth knowing.
Recovering the MQ queue manager
You attempt to restart the MQ queue manager after a server crash and startup fails.
The queue manager active logs appear okay, and the command strmqm –c for recovering any damaged system objects does not work. When starting the queue manager, you may see this error:
WebSphere MQ queue manager 'qmgr name' starting. AMQ7047: An unexpected error was encountered by a command. /var/mqm/qmgrs/QM Name/errors shows up the below error 05/24/13 19:25:17 - Process(6160464.1) User(mqm) Program(amqzxma0_nd) Host(hostname) AMQ7472: Object qmgr name, type catalogue damaged. EXPLANATION: Object qmgr name, type catalogue has been marked as damaged. This indicates that the queue manager was either unable to access the object in the file system, or that some kind of inconsistency with the data in the object was detected. ACTION: If a damaged object is detected, the action performed depends on whether the queue manager supports media recovery and when the damage was detected. If the queue manager does not support media recovery, you must delete the object as no recovery is possible. If the queue manager does support media recovery and the damage is detected during processing when the queue manager is being started, the queue manager automatically initiates media recovery of the object. If the queue manager supports media recovery and the damage is detected after the queue manager has started, it may be recovered from a media image using the rcrmqobj command or it may be deleted.
You may also see some "ghost" MQ objects in the folder /var/mqm/qmgrs/QM Name/queues, as shown below:
!!GHOST!13734786!0!F2A81300!1454 !!GHOST!13734786!0!B1A4BB92!1560 !!GHOST!13734786!0!84C22DD6!2093 !!GHOST!13734786!0!1FC1F0D6!1689
An FDC with the following probe ID and major and minor error codes is generated:
Probe Id: AO000001 Component: apiStartup Major Errorcode: xecF_E_UNEXPECTED_RC Minor Errorcode: arcE_OBJECT_DAMAGED Probe Type: MSGAMQ6118
The abrupt server restart has corrupted the MQ object catalogue.
Queue manager object catalog
The object catalog contains the details of all WebSphere MQ objects contained in the queue manager. It is used during queue manager startup, and a corrupted object catalog will prevent startup. The object catalog is located at /var/mqm/qmgrs/qmgr Name/qmanager.
Resolving the problem
Startup of the queue manager automatically recovers from an object catalogue in such a state without reporting an FDC, but only if the queue manager has media recovery enabled -- that is, only if you are using linear logging. If you are using circular logging, then no recovery is possible, and you have two options to recover the queue manager:
- Restore the queue manager configuration from the file system backup at /var/mqm/qmgrs/qmgr name.
- Delete the queue manager, re-create it, and restore the configuration from the saveqmgr backup.
Option 1 is preferable because a working copy of the queue manager object catalog is all that you need to recover the queue manager, and you don't have to deal with the complexities of a QMID change which is required with Option 2. Here are the steps to recover the queue manager from file system backup:
- Ask your media tape backup team to restore the /var/mqm/qmgrs/qmgr name file system with the backup taken couple of days before the server crash.
- After the file system is restored, try starting the queue manager. If you see the same error, repeat the above step with progressively older backups and try to start the queue manager again.
- If you are still unable to find a working backup, you will need to use Option 2 above.
Here are the steps to recover the queue manager from saveqmgr backup:
- Delete the corrupted queue manager: dltmqm qmgr name. Be sure to make a backup copy of qm.ini -- you will need it when re-creating the queue manager to determine the number of primary and secondary log files, log file pages, and any additional custom settings on the current queue manager.
- Re-create the queue manager using the command crtmqm qmgr name. You can add the –lf switch to specify the log file size and –lp and –ls to specify the number of primary and secondary log files to match up with the old queue manager.
- Update the qm.ini file of the new queue manager with any additional settings you backed up in the old qm.ini file.
- Start the queue manager: strmqm qmgr name.
- Restore the queue manager configuration: runmqsc qmgr name
< backup file name. For example, if your queue manger is
QM1 and the backup file is QM1.mqsc, then run the command: runmqsc QM1
/var/mqm > runmqsc <QM1.MQSC No commands have a syntax error. All valid MQSC commands were processed.
- Apply the authorities: ./qmgr authorities backup.oam
- For example, if your queue manger authorities backup file is QM.oam,
then execute it using ./ QM1.oam:
/var/mqm > runmqsc ./QM1.AUT The setmqaut command completed successfully.
- Now that the queue manager is restored, check the queue manager cluster configuration using the command: DIS CLUSQMGR (qmgr name) CLUSTER (cluster name).
- For QMGR QM1 and cluster CLUS1, run the command: DIS CLUSQMGR(QM1) CLUSTER(CLUS1).
- You will see two entries for QM1 with a two different QMIDs in the cluster. Determine which QMID corresponds to the old queue manager and FORCEREMOVE it from the cluster with the command: RESET CLUSTER(cluster name) QMID(qmid) ACTION(FORCEREMOVE). You can FORCEREMOVE a queue manager from a cluster only from a full repository queue manager. The queue manager should now be in a healthy state.
Recovering the broker
After starting up the queue manager, you start the broker and, and while it starts successfully, it is non-responsive. Even an mqsilist command takes a long time to show output, and sometimes it does not show any output at all. Because the queue manger has been re-created and the QMID has changed, the broker also needs to be re-created, and its configuration needs to be restored from the backup created with the mqsibackupbroker command. Sometimes the problem may persist even after you re-create the broker.
Execution groups may not bind to their corresponding queues, applications may fail to connect to SOAP ports, and broker deployments may fail. WebSphere MQ logs may show the following error:
05/27/13 04:37:20 - Process(9044138.1) User(mqm) Program(amqzxma0_nd) Host(hostname) AMQ7159: A FASTPATH application has ended unexpectedly. EXPLANATION: A FASTPATH application has ended in a way that did not let the queue manager clean up the resources owned by that application. Any resources held by the application can be released only by stopping and restarting the queue manager. ACTION: Investigate why the application ended unexpectedly. Avoid ending FASTPATH applications in a way that prevents WebSphere MQ from releasing resources held by the application.
WebSphere Message Broker logs will show the following errors for most or all of the execution groups in the broker logs:
May 27 04:24:25 hostname user:err|error WebSphere Broker v7005: (Broker Name.EG Name)BIP2116E: Message broker internal error: diagnostic information 'Fatal Error; exception thrown before initialisation completed', 'Load LILs', '18415870', '1', '13', '12'. : Broker Name.e404e800-3401-0000-0080-af98194179f4: /build/S700_P/src/DataFlowEngine/ImbMain.cpp: 252: ImbMain::ProgressChecker::~ProgressChecker: : May 27 04:24:25 hostname user:info WebSphere Broker v7005: (Broker Name.EG Name)BIP7050E: Failed to locate Java class com.ibm.broker.axis2.Axis2NodeRegistrationUtil.: (Broker Name..2ecf3aaf-3101-0000-0080-b2a00806016c: /build/S700_P/src/DataFlowEngine/NativeTrace/ImbNativeTrace.cpp: 698: getClasses: :
- Broker non-responsiveness may be caused by the change in the QMID of the broker queue manager, or by the corruption of the broker due to the abrupt server restart. Therefore the first option is to delete and re-create the broker and restore the configuration from the backup created with the mqsibackupbroker command.
- Another possible cause is broker overload, which you can investigate by creating a test broker and then creating a sample execution group. If the broker still does not respond to mqsi commands, then the broker install image is probably corrupted.
- An abrupt server restart can corrupt the broker install image in a way that causes the Java libraries to fail to load. The broker administrator needs to determine whether the corruption of the broker is limited to just the latest fix pack, or whether it involves the entire broker install image.
Resolving the problem
Follow the steps below to re-create the broker:
- Stop the broker using the command mqsistop broker name.
- Delete the broker using the command below. Do not specify option –s with the mqsideletebroker command, because it will delete all broker administrative security queues (SYSTEM.BROKER.AUTH and SYSTEM.BROKER.AUTH.egname) from the queue manager. These queues will be reused when the broker is re-created using mqsideletebroker broker name.
- Create the broker using the command mqsicreatebroker broker name -q qmgr name.
- Start the broker to verify that it starts up okay. Check the broker logs for any errors using mqsistart broker name.
- Stop the broker again and issue the restore broker command
mqsirestorebroker broker name -d directory path -a backup archive name:
mqsi restorebroker BKR1 -d /var/mqsi/backup -a BKR1_130617_085432.zip BIP1266I: Deleting existing configuration of broker BRK1 BIP1268I: Recreating broker configuration information from backup archive /var/mqsi/backup/BKR1_130617_085432.zip BIP8071I: Successful command completion You have mail in /usr/spool/mail/mbadm
- Start the broker. The QMID change is no longer the issue, but the broker may be still unresponsive.
- To ascertain that the issue is not with broker overload, create a test broker and then a test execution group in the broker. The broker gets created, but the execution group creation will time out. If you create an execution group with the –w option, the execution group may get created, but response to mqsi commands will take a long time, which shows that the issue is not with broker overload.
- Now that you know that the broker non-responsiveness is caused by corruption in the broker install image. Identify the level of fix pack applied. On distributed platforms, there is no option to selectively reinstall the fix pack, so you will have to uninstall the broker component and then reinstall it at the appropriate service level. For details on uninstalling and reinstalling WebSphere Message Broker, see either:
- After you reinstall the broker, start it and run the mqsi commands. Broker response should be normal.
This article described recovery procedures for WebSphere MQ and WebSphere Message Broker after a server crash. It showed you restore procedures for WebSphere MQ, both from the file system and from saveqmgr (which is included in WebSphere MQ V7.5 or later). It also covered restore procedures for WebSphere Message Broker, including troubleshooting tips to fix a non-responsive broker issue.
The authors would like to thank the IBM PMR support organization, especially S. Rao, for their technical assistance. Thanks also to Srinivas Aparadhi, IBM Technical Services Professional and Software Specialist, for his technical assistance.
- WebSphere Message Broker
- WebSphere Message Broker V8 information center
A single Web portal to all WebSphere MQ V7 documentation, with conceptual, task, and reference information on installing, configuring, and using WebSphere Message Broker V8.
- WebSphere Message Broker product page
Product descriptions, product news, training information, support information, and more.
- Download free trial version of WebSphere Message
WebSphere Message Broker is an ESB built for universal connectivity and transformation in heterogeneous IT environments. It distributes information and data generated by business events in real time to people, applications, and devices throughout your extended enterprise and beyond.
- WebSphere Message Broker documentation
WebSphere Message Broker specifications and manuals.
- Knowledge collection: What's new in WebSphere Message Broker
A list of resources that explain additions, enhancements, and improvements to WebSphere Message Broker V8 for distributed platforms and z/OS.
- WebSphere Message Broker forum
Get answers to technical questions and share your expertise with other WebSphere Message Broker users.
- WebSphere Message Broker support page
A searchable database of support problems and their solutions, plus downloads, fixes, and problem tracking.
- IBM Training course: WebSphere Message Broker V8
This course from IBM Training shows you how to use the components of the WebSphere Message Broker development and runtime environments to develop and troubleshoot message flows that use ESQL, Java, and PHP to transform messages.
- WebSphere Message Broker V8 information center
- IBM Integration Bus
- IBM Integration Bus V9 information center
A single Web portal to all IBM Integration Bus documentation, with conceptual, task, and reference information on installing, configuring, migrating to, and using BM Integration Bus.
- IBM Integration Bus product page
Product features, use cases, and resources.
- Download IBM Integration Bus Developer Edition
A lightweight edition that you can use for evaluation, development, unit test, and other scenarios.
- What's new in IBM Integration Bus V9
Introductory topic in the IBM Integration Bus Information Center.
- What's new for WebSphere ESB users
Description of key differences between IBM Integration Bus and WebSphere ESB.
- IBM Integration Bus V9 announcement letter
Official announcement information, including prerequisites, terms and conditions, and ordering information.
Integration Bus V9 introductory video
A short YouTube video showing key IBM Integration Bus features.
Integration Bus forum
Forum on mqseries.net for user questions, answers, and tips.
- IBM Integration Bus V9 information center
- WebSphere resources
- developerWorks WebSphere
Technical information and resources for developers who use WebSphere products. developerWorks WebSphere provides product downloads, how-to information, support resources, and a free technical library of more than 2000 technical articles, tutorials, best practices, IBM Redbooks, and online product manuals.
- developerWorks WebSphere application integration developer
How-to articles, downloads, tutorials, education, product info, and other resources to help you build WebSphere application integration and business integration solutions.
- Most popular WebSphere trial downloads
No-charge trial downloads for key WebSphere products.
- WebSphere forums
Product-specific forums where you can get answers to your technical questions and share your expertise with other WebSphere users.
- WebSphere demos
Download and watch these self-running demos, and learn how WebSphere products can provide business advantage for your company.
- WebSphere-related articles on developerWorks
Over 3000 edited and categorized articles on WebSphere and related technologies by top practitioners and consultants inside and outside IBM. Search for what you need.
- developerWorks WebSphere weekly newsletter
The developerWorks newsletter gives you the latest articles and information only on those topics that interest you. In addition to WebSphere, you can select from Java, Linux, Open source, Rational, SOA, Web services, and other topics. Subscribe now and design your custom mailing.
- WebSphere-related books from IBM Press
Convenient online ordering through Barnes & Noble.
- WebSphere-related events
Conferences, trade shows, Webcasts, and other events around the world of interest to WebSphere developers.
- developerWorks WebSphere
- developerWorks resources
downloads for IBM software products
No-charge trial downloads for selected IBM® DB2®, Lotus®, Rational®, Tivoli®, and WebSphere® products.
business process management developer resources
BPM how-to articles, downloads, tutorials, education, product info, and other resources to help you model, assemble, deploy, and manage business processes.
Join a conversation with developerWorks users and authors, and IBM editors and developers.
- developerWorks tech briefings
Free technical sessions by IBM experts to accelerate your learning curve and help you succeed in your most challenging software projects. Sessions range from one-hour virtual briefings to half-day and full-day live sessions in cities worldwide.
- developerWorks podcasts
Listen to interesting and offbeat interviews and discussions with software innovators.
- developerWorks on
Check out recent Twitter messages and URLs.
- IBM Education Assistant
A collection of multimedia educational modules that will help you better understand IBM software products and use them more effectively to meet your business requirements.
- Trial downloads for IBM software products