My collegue Tony Sharkey, did some work to compare IMS V12 with the new IMS V13 - and initially was disappointed at the low throughput he obtained.
The steps to tune IMS are listed below, and eventually IMS V13 had a higher throughput than IMS V12.
Running with IMS v12 the IMS Bridge tests were driving the throughput to around 11,000 transactions per second. Running with IMS V13 the throughput dropped to 700 transactions a second.!
The environment was 3 QM's in QSG where each QM was on a separate LPAR.
QM 1 is located on the same LPAR as the IMS control region.
QM's 2 and 3 access the IMS via XCF.
Batch jobs on each LPAR looped putting a message to the IMS bridge queue and waiting for its reply.
The MPRs were on the same LPAR as QM1 and the IMS system
Each LPAR was initally defined with 3 dedicated CP's, but LPAR 1 can use up to 32 dedicated CPs, LPAR 2 can use up to 10 dedicated CPs and LPAR 3 is limited to 3 dedicated CPs. The CF has 4 dedicated processors.
Actions taken to improve performance
* Ensure PWFI is on.
This actually was already on, but it does make a difference.
* Check IMS attribute QBUF.
For some reason the QBUF setting changed between IMS 10 --> IMS 13 for the default Hursley IMS installs.
IMS 10 had been tuned to 255 (by me)
IMS 12 did not specify QBUFS and fell back to the value in defined to BUFFERS in the MSGQUEUE macro. In our case this was set to 255.
IMS 13's initial value was 5.
In the IMS trace report there is a field "NUMBER OF WAITS BECAUSE NO BUFFER AVAILABLE" which gives a hint as to lack of buffers..
* Check IMS attribute PSBW
When reviewing the performance data, it was observed that not all of the available MPRs were processing workload.
This was because there wasn't enough storage set aside to run all of the MPRs concurrently.
By increasing the value of the PSBW from 24 to 200 (KB), we were able to use all 16 MPRs.
* Increase the size of the online datasets
The default sizes of the DFSOLP* and DFSOLS* datasets meant that they were switching frequently..
By increasing them from 50 to 1500 cylinders each, we reduced the frequency of switching.
* Increase the number of online datasets
With the increase throughput, the IMS archive jobs were trying to process multiple logs in each step.. It got to the point where no logs were available.
By increasing from 5 to 20 logs we were no longer waiting for active logs.
* Increased the size of the IMSMON dataset
Strictly this isn't a performance improvement but the size originally allocated was far too small to capture 1 minutes worth of data.
Increased to 270 cylinders which currently seems large enough (when trace is enabled)
* Enable zHPF
Given that IMS always logs data to disk, zHPF makes sense..and in some cases saw a 20% improvement in throughput.
(Indeed it turned out that this was the difference between my "baseline" and "fast" tests!)
Even though zHPF was enabled across all 3 LPARs, the impact was greatest on the LPAR with the IMS region.
For some reason, making IMS archiving faster allowed the QM SRB's to get/send messages at a higher rate on QM 1 than the other QM's.
When zHPF was disabled, even with sufficient CPUs, the distribution of workload was more even across the QM's.
All of these changes brought the IMS v13 bridge tests up to parity with IMS v12...
What was slowing it down?
At this point, Tony wanted to see what was slowing it down..
It turned out that with sufficient TPIPES in use, the issue was CPU on the LPAR that hosted the IMS region.
Increasing the CPUs to 6:3:3 across the LPARs saw the transaction rate increase to 19,000 transactions per second but still constrained.
With a further increase to 9:3:3, the transaction rate peaked at 21,400 per second.
At this point, LPARs 2 and 3 were constrained. Additionally the CF was running at 75% busy, which is above the recommended usage for a multi-way CF.
Potentially it could be driven harder by adding more CPUs (simply on LPAR 1 and LPAR 2 and with a system change on LPAR 3 and to the CF).