Logger enhancements for MQ v9.0.2
MarkWhitlock 1100006RG1 Visits (3060)
In MQ v9.0.2, we've enhanced linear logging to enable automatic management of log extents and automatic recording of media images, as well as improving the general performance of linear logging scenarios exploiting these capabilities. This addresses some of the most highly voted for RFE's. The decision to choose circular or linear can now be made based primarily upon whether you want media recovery. Now that we've made linear so much easier to administer and brought linear up to a similar performance to circular, these factors cease to be so important in a linear vs circular decision. We've also added in some new statistics to help you tune your log, and now attempt to keep the workload in the primaries only.
Before v9.0.2, if you used linear logging, you had to manage log extents yourself - deleting them when they were no longer needed for restart or media recovery, typically by using one of the supportpacs or by writing your own logger event listener. Also you had to record media images regularly yourself by calling rcdmqimg periodically.
But with v9.0.2, the queue manager can automatically manage log extents and automatically record media images, this makes linear logging almost as easy to administer as using circular logging. The performance of linear logging is much improved to the extent that circular logging now only performs slightly better. As it has always done, linear logging also provides media recovery which circular logging does not. By media recovery, I mean it allows you to recover objects and queues if they are damaged.
The automatic reuse of log extents is responsible for much of the performance benefit you get with automatic log management. You can recognise extents that are waiting to be reused (we call them reuse extents) because they are prefixed with the letter "R" instead of the usual "S". Switch on log management by passing -lla (for automatic) or -lln (for archive) to crtmqm instead of -ll. To change an existing linear logging queue manager to use automatic or archive log management, set LogM
Figure 1 above, shows the improvement in linear logging performance, between V9.0.1 and V9.0.2 (with automatic log management). A workload run that was limited by the rate the queue manager could write to the log files (on SAN storage). The overall round trip rate is shown on the y-axis, which increases as we add more applications putting and getting to the queues. For both linear logging tests, media images were recorded every 10 seconds with rcdmqimg. Whilst the V9.0.1 test will continue to create and format new linear log files continuously, in V9.0.2, existing log files are made available for re-use as the media recovery point is moved forward in the log. Performance is significantly improved in V9.0.2 as a result, coming close the that of circular logging (with an efficiently sized log). Running the same test with an over-sized circular log file set (V9.0.2 Circular Log(Large) in the figure above) shows a similar level of performance to linear logging, indicating that the number of files being used is the main difference between circular and linear logging now, rather than the additional formatting costs.
Automatic recording of media images
Once you have set IMGSCHED(AUTO), also set IMGINTVL and/or IMGLOGLN. IMGINTVL is how often the queue manager will take automatic media images, in minutes. IMGLOGLN is how much log data gets written before the queue manager takes the next automatic media image, in megabytes. Both IMGINTVL and IMGLOGLN are targets - the actual gap between images may be larger or smaller.
Setting IMGLOGLN is often a better idea than only setting IMGINTVL, as setting IMGLOGLN will stop your log getting too large no matter how heavy your workload is. Or you can set both IMGINTVL and IMGLOGLN together and then an image is recorded when either of them expire. In fact setting both can be quite a good idea, so images are taken more frequently during a heavy workload when IMGLOGLN expires, but are still taken occasionally during a light workload when IMGINTVL expires.
You can even record media images manually by calling rcdmqimg when you've enabled automatic recording of media images, if you know it's a really good time to do so, such as a very quiet time. When you do that the queue manager will reset its interval and log length before the next automatic media image is taken.
Beware of setting IMGLOGLN too small though. If you make it a similar sort of size to your largest queue, then the queue manager will spend all its time recording media images, because the media image itself is included in the log data being counted towards IMGLOGLN. Better to make IMGLOGLN a fraction of the size of your log filesystem - maybe a third or less.
There are other performance advantages to automatic recording of media images such as staggering of media images, partial media images, and recording media images early when the queue is empty - but I'll talk more about these later.
Keeping the workload in the primary extents only
Before v9.0.2, the queue manager scheduled checkpoints based on the number of primaries plus secondaries - so secondaries were often used for regular workload. In v9.0.2, the queue manager prefers to only use primaries for regular workload. Secondaries should only be used when there are long-running transactions.
This means that the next checkpoint is scheduled when a large fraction of the primaries are being used. If the next checkpoint is scheduled before the previous checkpoint has completed you will get AMQ7466 "There is a problem with the size of the logfile. The log for queue manager is too small to support the current data rate." So if your normal workload has previously been using secondary extents, you may well get this error message in your error logs when you migrate to v9.0.2 whether or not you use the new logger enhancements and whether you use linear or circular logging. If you do, increase the number of primary extents, making sure that the total size of all your extents won't exceed the size of your log filesystem, and also making sure that you're ok with increasing your restart time, since increasing primaries alone increases the maximum size of your active log.
More log attributes and better statistics
ARCHLOG is the oldest extent needing archiving (only set for archive log management). ARCHLOG is also passed to your logger event listener, to help you know where to start archiving extents. RECSZ, MEDIASZ, ARCHSZ and REUSESZ are the sizes in megabytes of all the extents needed for each of restart recovery, media recovery, needing archiving and available for reuse.
LOGUTIL and LOGINUSE both give the percentage of the primary files that are being used for the active log - so these statistics can help you better size your log. LOGUTIL and LOGINUSE are different though. LOGINUSE is a point in time statement and so will increase until a checkpoint is taken, when it will typically suddenly decrease as a bunch of extents become inactive. So a rolling average of LOGINUSE would help more than the sawtooth that sampling LOGINUSE would give you. And that's exactly what LOGUTIL is - a rolling average of the percentage of the primary files that are being used by the active log. So LOGUTIL shouldn't jump sharply on a checkpoint in the way LOGINUSE does. Of course LOGINUSE and LOGUTIL can both be over 100% - that just means you're using secondaries. And because the queue manager tries to keep the workload in the primaries, when LOGUTIL goes above 100% it means you have long-running transactions. Perhaps you're ok with that - but you might want to think about making those transactions shorter, or else increasing your primaries a bit. Best practice is to keep LOGUTIL below 100%.
When an object is damaged
When using linear logging, you can recover the queue by using rcrmqobj. But until you recover or delete the object, your log filesystem will start filling (just as before V9.0.2). This is because the queue manager will no longer be able to record media images of the damaged object. So the oldest extent needed for the media recovery of the damaged object (and all subsequent ones) will need to kept by the queue manager. If your workload continues, new log extents will be written to but the old ones can't be reused until a new image of the object is taken which can't happen until the damaged object is recovered or deleted - hence the problem.
How long you have until your log filesystem fills up completely depends on how big your log filesystem is and how much log data is being written to disk by your workload. If or when your log filesystem fills, the queue manager will start rolling back transactions to conserve log space. So your workload will get rolled back. While the queue manager should initially stay up, if the situation persists the queue manager may eventually fail.
You will know you have a damaged object because you will get FFDCs as soon as the queue manager realises. It would be prudent to investigate why the object got damaged in the first place, and then recover it quickly which should alleviate the space problems.
Performance advantages of automatic recording of media images
Figures 2&3 above show the difference in impact between manual, and automatic media image recording. A queue manager was configured with a deep queue (1000 x 1MB messages on queue) and a workload run that was limited by the rate the queue manager could write to the log files (round trips/sec on the y-axis). In Figure 2, media images are triggered manually, by running rcdmqimg roughly every 10 seconds (we wait 10 seconds after the completion of each copy). In figure 3, the same test is configured to run with automatic media image recording (with IMGLOGLN set to 400, to approximate the frequency of image recording in the manual test). As you can see in Figure 2, the test is severely impacted by the asynchronous nature of the manual image recording, whereas the automatic media image recording (Figure 3) ameliorates the impact by interleaving the image records with the normal log writes. Since this is a test running at the limits of the log disk bandwidth, the peak rate drops in Figure 3, but if we we had some headroom in terms of log write bandwidth, the media images recorded with the new automatic management might take place with little or no impact on the workload.
Another performance advantage of automatic recording of media images is that the queue manager may be able to record less data into the log. Establishing a point of recovery while not recording a full copy of the object in the log is known as a partial media image. When the queue manager records a partial media image, it does so on behalf of a recovery point which is a little time in the past. If many (or all) of the messages on the queue were put since the recovery point, then these messages do not have to be recorded in the media image, since recovering the queue would replay all log records since the recovery point and so would include all these recent puts. In a similar vein, if many or all of the messages that were on the queue at the recovery point have been got off the queue since, these messages don't have to be recorded in the partial media image because they aren't on the queue now. So, if the queue manager is lucky, the partial media image may contain no data at all and so be very quick to record, even though the queue was never empty. This optimization is most effective when most messages only rest on the queue for a short period of time. If you do the opposite and use MQ as a database, using queues as a place to store data long-term, then the queue manager has no alternative but to record full media images every time.
One of the best times to automatically record a media image is when the queue is completely empty, or almost empty, because then there is little user data to log so the media image written is very small. The queue manager looks out for such times and if a media image hasn't been written for a while, but IMGINTVL or IMGLOGLN hasn't expired yet, the queue manager may decide to record a media image anyway because now is a really good time. It is much easier for the queue manager to spot such times than it is likely to be for you, so this provides another performance boost to automatic media image recording.
More runmqsc logger commands
Also you can make individual queues recoverable or not by setting the new queue attribute IMGRCOVQ or letting it default to the IMGRCOVQ queue manager attribute. You can also make other objects recoverable or not by setting the new queue manager attribute IMGRCOVO. By default when you use linear logging, all objects are recoverable, so you can recover them and media images need to be recorded for them. But if you have objects or queues which contain messages that you don't really care about or could be regenerated or are regularly superceded, then make them not recoverable. This means that media images can't be recorded for them but they can't be recovered either.