Coldstart: What to do if log extents are missing or corrupt

If your enterprise loses some or all of the log extents needed for restart recovery, the queue manager will be unable to replay the recovery log and so fails to restart. If you require your queue manager to restart when the recovery log is corrupt in any way, at the expense of maintaining data integrity, it is possible to do so, although strongly discouraged. This process is known as coldstarting a queue manager.

Important: Coldstarting a queue manager should be considered only in exceptional circumstances and carries data integrity risks as described on this page. IBM®suggests that you rebuild a queue manager, in preference to coldstarting, in response to corrupt data files.

If a coldstart is required for operational reasons, engage your IBM support representative to review the root cause of the issue. You should replace a coldstarted queue manager with a rebuilt queue manager at the earliest opportunity.

The effects of coldstart

On coldstart, the queue manager creates an empty recovery log and relies on the data in the queue files and other object files in their existing state. Because the data in the queue files can be inconsistent, messages might be lost, duplicated, corrupted, or inconsistent.

The queue manager stores the configuration of all the other persisted objects in the recovery log, as well as in object files. Other internal state data is also recorded in the recovery log as well, so on coldstart, internal state data is reset and all this other configuration data might be inaccurate.

The effects of coldstart are unpredictable and wide-ranging so you should avoid a coldstart unless absolutely necessary. After coldstarting, the information in the queue and object files can be so inconsistent that the queue manager will not restart at all.

If the queue manager does restart, there is no simple way of discovering what message data or configuration can be relied on and what cannot. Also, after a coldstart, queues might be damaged and so become completely unusable.

Additionally, if you can get from, or put to, a particular queue, the messages on it might be corrupt, missing, or duplicated. Transactions and channels might be stuck in-doubt. Even if your queue manager coldstarts successfully and the queues look intact, the unpredictable effects of the coldstart might not be realized until much later.

What to do if you need to coldstart

Performing a coldstart should not be considered standard operational practice, and IBM strongly discourages you from doing this. However, if you are in a position where you definitely need to coldstart a queue manager, contact IBM MQ Support .

The process for coldstarting a queue manager used to be much more complicated for a linear queue manager than a circular one. In IBM MQ 9.1.3, the coldstart process has been much simplified, and does not involve copying or renaming log extents any more.

From IBM MQ 9.1.3, contact IBM Support, who will give you a key which you pass to the strmqm command to coldstart a queue manager.
Attention: The IBM MQ 9.1.3 coldstart command still carries the same risks of losing data integrity as a manual coldstart, and IBM strongly discourages you from doing this.

Eliminating future cold starts: a request

The strmqm command requires a key to coldstart, because IBM MQ wants you to contact IBM MQ Support if you need to coldstart, as IBM MQ is keen to understand how you got into this situation.

Clearly coldstart is something that is best avoided. IBM MQ has gone to considerable effort to make sure that you will not need to coldstart your queue manager, and IBM is keen to discover if there is anything more the product can do to alleviate having to coldstart.

Precautions to avoid a coldstart

The default logging method when creating a queue manager is circular logging. With circular logging you allow the queue manager a particular number of primary and secondary log extents of a given size. Create your log filesystem large enough to contain all the primary and secondary log extents, and you should never need to administer them.

Alternatively, you can use linear logging as opposed to circular. Linear logging gives you the added ability to recover queues and other objects, in the unlikely event that they become damaged. But by default, linear logging requires you to delete log extents that are no longer needed for restart or media recovery. This is referred to as manual log management.

When administering log extents in this way, it is possible to inadvertently delete too many log extents and so end up having to coldstart. To mitigate this risk, use automatic log management, so the queue manager manages log extents on your behalf.

The best practice is to put your recovery log in a separate log filesystem which only contains the recovery log. If you put your recovery log in the same filesystem as the rest of your queue manager, you can sometimes find that filesystem accidentally filling up, perhaps due to large queue files. Either make the log directory for the queue manager a separate filesystem, or specify a different log filesystem using the -ld command line option on the crtmqm command.

If the filesystem holding the queue files fills, you might not be able to put to those queues, but the queue manager continues running. If the filesystem containing the recovery log fills, the queue manager ends abruptly and will not restart until you free up some space.

Be careful not to delete log extents needed for restart recovery, otherwise you might find yourself needing to coldstart. Sometimes you might find that you need to coldstart because the disk failed that contains their recovery log. Best practice is to put the recovery log on a replicated disk and so mitigate the risk of a disk crash.

Moving your messages and configuration to a new replacement queue manager avoids the possibility of ongoing problems with a queue manager that has been previously coldstarted.

Keep a note of which queue managers have been previously coldstarted, even if they were coldstarted a long time ago and have been stopped, restarted, and migrated in the meantime. When you contact IBM Support, say if the queue manager has been previously coldstarted and if so, give as much information as possible into what caused the requirement for a coldstart.