We have had a couple of questions about which file system people should use for their distributed production queue manager.
One person said they found an old server which had never been used. They put a current Linux on it, and ran a queue manager which became the production queue manager.
Full marks for reuse - but it did not give the throughput they needed, so no marks for planning.
We also had a question asking if we could come up with a tool or document which could predict the throughput for a given file system and persistent message profile.
These questions are both connected. Starting from an existing file system and putting a queue manager on it is approaching the problem the wrong way.
You should approach the planning from an architectural and a business perspective. The Agile approach of "Fail Fast Fail Often" is not a good strategy for this.
From an architectural perspective
- Does this queue manager need to run on more than one box, eg are you using a standby queue manager?
- If so then you have to use external disks - not locally attached to the server.
- If your queue manager only every runs on one box, you can use local or remote disk as long as they meet the requirements
- If your disks are on a network - then the network will add time to each IO. You may need to tune the IO, especially if large buffers are being written ( over 200KB)
- Do your disks have contention from other users?
- Does any other work affect the disks?
From a business perspective
- Does the solution meet your availability needs? For example you may need mirrored disks which tend to be slower,
- Is there may be a requirement that the time in MQ has to meet Service Level Agreements - such as less than 10 ms in MQ. If the response time of your disks are 5ms then you are unlikely to meet the SLA. Do you need Solid State drives (SSD) or will Hard Disk Drives(HDD) do?
- Often you have little choice for where you place your MQ files as you have to follow your operational standards. You need to understand the impact of where you put your files.
To find if your environment can achieve the required throughput and meet the response time criteria, you need to run a workload similar to your peak workload to see.
One word of warning - do this during peak activity - not late at night when the system is not being used - as you want the disk load, and the network load you will get when running it in production.
I discussed this with Tony Sharkey MQ on z/OS Performance and he summarised it as
- Know your requirements - e.g. response time (possibly throughput rate or logging rate), message sizes, whether HA is a requirement and so on.
- Know your systems - i.e know what machines are available, their capacity, network, disk response times etc, via architecture diagrams, monitoring software, benchmarks of your configuration and managed processes (i.e. knowing backups are run at 8pm etc)