IBM Support

Db2 log striping recommendation

Product Documentation


Abstract

This document clarifies IBM’s position on the use of striping for Db2 for z/OS logs.

Content

History of IBM disk subsystems and log-striping recommendation

First, some history of log striping. DFSMS Striping was first introduced in 1993. Striping served two purposes. The first purpose was to avoid hot spots on individual disks. The second purpose was to increase sequential throughput for individual data streams.


The combination of RAID, faster disks, and Parallel Access Volumes largely eliminated the hotspot problems without the need to use striping. Also, channels were much slower in 1993 than they are today. In the intervening years, IBM delivered FICON Express 8 and 8S channels as well as High-performance FICON. This year IBM delivered FICON Express 16S channels. In May of 2015, IBM also delivered R7.5 for the DS8870 control unit that now supports 16 Gb per second host adapters, which complement FICON Express 16S channels.
Several years ago IBM also support Solid State Disks (SSD), and in 2014 IBM shipped High Performance Flash Enclosures (HPFE) as well as service time improvement made possible by High-performance FICON (zHPF) support. If disks were a bottleneck for Db2 logs, SSD and HPFE could be used for Db2 logs, but generally speaking the disks are not a bottleneck for logs. More likely it’s the z/OS channels, the host adapters in the storage control unit, and the switches in between z/OS and the control unit—those are the bottlenecks. FICON Express 8S reduced the benefits of log striping and FICON Express 16S with 16Gb host adapters further reduces the benefits of striping.

How striping works

Now let’s consider how striping works. When Db2 needs to write 128 log pages, if the log data set has two stripes, z/OS performs two parallel I/Os of 64 pages each. That is good because more data can be transferred in parallel. If the log data set has four stripes, z/OS will perform four parallel I/Os of 32 pages each, and maybe that is faster than two stripes in some cases.

However, now let’s consider what happens when Db2 needs to write only 4 log pages. If the log data set has two stripes, z/OS needs to perform two parallel I/Os of two pages each. That is bad because of higher I/O overhead compared to the amount of data processed. If the log data set has four stripes, z/OS performs four parallel I/Os of one page each. That is even worse.

Some paper and pencil arithmetic is described here for illustration using a DS8800 control unit with FICON Express 8S channels. The minimum time to write a 4K log page is about 170 microseconds. The minimum time to write 128 pages is 2.5 milliseconds. Therefore, the data transfer time per page is 19 - 20 microseconds. In other words, of the 170 microseconds to write a single page, about 150 microseconds is overhead. Now, suppose Db2 needs to write 4 pages. If the logs were not striped, it would take about 228 microseconds. If the log has 4 stripes, z/OS divides the 4 pages among 4 stripes and do start 4 I/Os. The I/Os are executed in parallel and each one might take only 170 microseconds, but they are started serially. It might take a few microseconds to start each I/O and another few microseconds to process each I/O interrupt. Therefore, it might take less than 228 microseconds to execute all 4 I/Os, but it might also take longer because each I/O is exposed to the potential of some disk contention or some other type of contention. Such contention often shows up in the form of disconnect time. In a non-PPRC environment, disconnect time for your log I/Os is a warning sign that striping is not helping you as much as you would like, and might be hurting you. However, in a PPRC configuration disconnect time is normal, making it difficult to judge whether disconnect time is hurting the performance.

Log I/Os are much less likely to have any disconnect time if you could dedicate one DS8000 extent pool for active log data sets. Dedicating extent pools for logs means that the logs cannot share extent pools with the user database. Is it impractical to dedicate so much space for the Db2 logs? Probably so. It’s even more impractical to dedicate one extent pool for each stripe. The more stripes you use, the more difficult it is to isolate the logs from the ill effects of the database. By the way, the database I/Os that hurt your log performance the most are not the synchronous read I/Os. The I/Os that hurt the most are the Db2 deferred writes and cast out I/Os which are asynchronous.

Another way to reduce the disk contention is to use high-speed disks. It might be sufficient to use 15K rpm spinning disks, but SSD is even better, and High-performance Flash Enclosures (HPFE) are better still. Note, on HPFE device adapter can achieve about 3.8 times higher write I/Os than traditional SSD.

Besides the fact that the stripe I/Os must be started serially, 4 stripes requires 4 I/O interrupts. Each I/O interrupt disrupts whatever task was trying to use that CPU. Starting in Db2 11, the log I/Os are started under an SRB that is zIIP eligible, but there is no guarantee that the I/O interrupts are processed by a zIIP. More I/O interrupts from Db2 log could add more CPU time in system services address space. For this reason, you want to see some significant response time benefits from striping to help justify the CPU cost. Generally, a more cost-effective way to achieve better log performance is to invest in better hardware than using the striping.

 

Log striping measurements

Three sets of measurements were done using a DS8800 with 10K or 15K RPM spinning disks.

First the measurements were done without any competing database I/Os at all. Db2 Commit response times were used to evaluate the performance of striping.

For the first measurement, the average number of log pages created per commit was 16.5. Using four stripes increased the commit response time by 6.3% compared to nonstriped logs. For the second measurement, the number of log pages created per commit was increased to 46.4. Four stripes reduced the commit response time by 23%. Next, the commits were removed completely so that Db2 would asynchronously write 128 pages per I/O. This is how we evaluate the Db2 logging bandwidth. Four stripes increased the logging bandwidth by 28.5%.

The second test case issued a commit after every insert, causing most of the commits to write only a single page. This test case was measured without striping, with and without disk contention. The disk contention was caused by Db2 deferred writes, which were brought about by some Db2 update jobs writing to 1000 different Db2 tables and indexes. As shown in table 1, the disk contention caused the commit response times for the insert job to increase by 20 to 30%. When there was no contention, using 4 log stripes increased the commit response time by 10.5%. When there was contention, using 4 log stripes increased the commit response time by 2.5%.

Without disk contentions With disk contentions
Non Striping 0.381 (ms) 0.484 (ms)
4 Striping 0.421 (ms) 0.496 (ms)
Table 1: Db2 average update commit time (in millisecond) per event using small log size

The last measurement was done using the online transaction workload which simulates brokerage transactions. There are 400 concurrent threads executing approximately 6000 transaction per second. Transactions consist of read and update. When active logs are striped across 4 volumes on 8 different extent pools (different raid ranks with 15K RPM DDM), log write I/O time has increased from 1.139 ms to 1.230 ms, or 8% increase. Average log record per log I/O was 2.2 pages in this workload. Although we see minor increase in log write wait, there was no visible impact on overall transaction response time since the major time in the transaction was spent in reading database (sync I/O wait) rather than commit wait.

 

Log striping and Db2 workloads

As a general rule, OLTP workloads tend to write fewer log pages per I/O than batch workloads do; striping tends to help batch workloads while hurting OLTP workloads. Striping might have benefited batch workloads with FICON Express 4 channels, but the benefits were reduced by FICON Express 8S while the costs might have increased. Why might the cost have increased? Because while newer control units have tended to decrease the average I/O response times, the maximum I/O response time hasn’t necessarily gone down. The greater the variation of I/O response time is, the more likely it is that striping will hurt OLTP performance. As mentioned above, disk disconnect time for your active log device can be used as one of indicators to evaluate potential impact from striping.

 

Conclusions

In conclusion, as a result of these considerations IBM discourages customers from striping their Db2 logs with today’s disk subsystems. However, if you are already using Db2 active log striping and there is no performance issues observed, there is no action required. Our measurements did not show significant enough difference to take an immediate action.


We observed that log striping can be beneficial when you are writing a large number of log records per I/O if there is no disk constraint. This is typically seen in the batch updates with less frequent commits. With a smaller number of log records per log I/O, or, if there is a disk constraint, striping does not add any value and might increase commit response time.

IBM cannot guarantee that the use of striping causes either positive or negative impact. Many factors influence the striping performance, such as the hardware configuration, commit frequency, size of log write, disk and channel utilization.

Original Publication Date

20 May 2015

[{"Type":"MASTER","Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEPEK","label":"Db2 for z\/OS"},"ARM Category":[{"code":"a8m0z000000076CAAQ","label":"Data Integrity or Corruption-\u003ELogs"},{"code":"a8m0z00000007G2AAI","label":"Performance-\u003ESystem"}],"ARM Case Number":"","Platform":[{"code":"PF035","label":"z\/OS"}],"Version":"11.0.0;12.0.0"}]

Document Information

Modified date:
31 May 2022

UID

swg27045811