IBM Support

DB2 Backup performance: how named pipes can help us if compression is the bottleneck?

Technical Blog Post


Abstract

DB2 Backup performance: how named pipes can help us if compression is the bottleneck?

Body

We all know how important it is to have the data spread across few tablespaces in order to speed up the backup. DB2 utilizes IO parallelism during the backup by processing multiple tablespaces at the same time (BACKUP ... PARALLELISM <N>). E.g. if you would have 10 tablespaces and go with parallelism 5, DB2 will spawn 5 buffer manipulator EDUs (db2bm) that will read 5 tablespaces in parallel (with the help of prefetchers). Once given db2bm is done with a tablespace, it will pick up the next one from those, which were not backed up yet. This works great if you have multiple tablespaces with similar size - we will likely have requested IO parallelism utilized for the whole backup process.
It is not that great though when you have some imbalance in the data distribution across the tablespaces, e.g:
$ db2 "select substr(tbsp_name,1,16) tbsp_name, tbsp_total_size_kb from sysibmadm.mon_tbsp_utilization order by tbsp_total_size_kb desc"

TBSP_NAME        TBSP_TOTAL_SIZE_KB  
---------------- --------------------
TSDAT                        12484608
TSDAT2                         360448
SYSCATSPACE                    131072
SYSTOOLSPACE                    32768
TSIDX                           32768
TEMPSPACE1                          4

  6 record(s) selected.

- vast majority of the data (12 GB) is located in a single tablespace named 'TSDAT'. DB2 backup will still can utilize parallelism during the backup, but overall backup time will be determined by the size of that one big tablespace. This is how it will look in the backup statistics:

2016-06-24-09.21.37.038582+120 E69858E1592           LEVEL: Info
PID     : 13430                TID : 139960497727232 PROC : db2sysc 0
INSTANCE: db2v105              NODE : 000            DB   : SAMPLE
APPHDL  : 0-31                 APPID: *LOCAL.db2v105.160624071934
AUTHID  : DB2V105              HOSTNAME: myhost1
EDUID   : 138                  EDUNAME: db2agent (SAMPLE) 0
FUNCTION: DB2 UDB, database utilities, sqluxLogDataStats, probe:395
MESSAGE : Performance statistics
DATA #1 : String, 1082 bytes

Parallelism       = 5
Number of buffers = 5
Buffer size       = 16781312 (4097 4kB pages)
                                                                 
BM#    Total      I/O      MsgQ      WaitQ      Buffers   MBytes
---  --------  --------  --------  --------    --------  --------
000     96.05      2.39      0.00     93.64          10       116
001     96.05     33.02     61.96      0.01         762     12191
002     96.05      0.79      0.02     95.20          12       179
003     96.05      0.29      0.00     95.74           2        16
004     96.05      0.04      0.00     95.99           1         0
---  --------  --------  --------  --------    --------  --------
TOT    480.27     36.55     61.99    380.61         787     12503

MC#    Total      I/O      MsgQ      WaitQ      Buffers   MBytes
---  --------  --------  --------  --------    --------  --------
000    121.75     84.11     11.89      0.00         788     12579
---  --------  --------  --------  --------    --------  --------
TOT    121.75     84.11     11.89      0.00         788     12579


- only one db2bm (BM# 001) is doing the majority of the work (see "Buffers" or "MBytes" columns), all others are quickly done with their tablespaces and are waiting for 001 to finish. If you don't know how to interpret those numbers, you can find more details here:
https://www.ibm.com/developerworks/community/blogs/DB2PerfTips/entry/backup_statistics?lang=en
but overall we should look first at the "Total" column of the output - it tells us how long given EDU was active. Note that all db2bms (buffer manipulators - EDUs reading the tablespaces) and db2mcs (media controllers - EDUs writing backup image either to disk or vendor) are working in parallel, so we are not really concerned with the sum (neither TOT row), but rather the highest number in the "Total" column. In the example above it is 121.75 for MC 000. Output is in seconds, so that means backup took around 2 minutes. For our media controller (MC 000), out of those 121 seconds, 84 were spent writing backup image to disk ("I/O" column) and almost 12 waiting for buffer manipulators (db2bms) to fill up the buffers with data from tablespaces ("MsgQ").
On the reading side (buffer manipulators) the "busiest" EDU was BM 001, which was active for 96 seconds, out of which 33 [s] were spent doing the IO ("I/O" column -reading tablespaces) and 61 [s] waiting for free buffers to put the data in (basically, waiting for MC EDU, which was busy writing image to disk). All other BMs spent most of the time in WaitQ, which means there were done with their work (no pending tablepsaces to process, just waiting for remaining BM 001 to finish).
There is really not much we can do in such situation other than distribute the data across multiple tablespaces (e.g. using ADMIN_MOVE_TABLE).

Such a flaw in tablespace design is even more severe when it comes to backup performance if you would decide to use compression. This is how statistics look after adding COMPRESS clause to backup command:

2016-06-24-09.06.07.659272+120 E44785E1893           LEVEL: Info
PID     : 13430                TID : 139960585807616 PROC : db2sysc 0
INSTANCE: db2v105              NODE : 000            DB   : SAMPLE
APPHDL  : 0-23                 APPID: *LOCAL.db2v105.160624070108
AUTHID  : DB2V105              HOSTNAME: myhost1
EDUID   : 23                   EDUNAME: db2agent (SAMPLE) 0
FUNCTION: DB2 UDB, database utilities, sqluxLogDataStats, probe:395
MESSAGE : Performance statistics
DATA #1 : String, 1383 bytes

Parallelism       = 5
Number of buffers = 10
Buffer size       = 11538432 (2817 4kB pages)
                                                                               Compr
BM#    Total      I/O      Compr     MsgQ      WaitQ      Buffers   kBytes    kBytes
---  --------  --------  --------  --------  --------    --------  --------  --------
000    304.25     32.36    271.01      0.23      0.00         675  12483584  12493775      8469
001    304.25      0.53      4.51      0.00    299.17          12    183296    183391        95
002    304.25      0.84      1.58      0.00    301.81           2    119744    119744         0
003    304.25      0.08      0.11      0.00    304.04           1     16512     16512         0
004    304.25      0.02      0.00      0.00    304.20           1       624       624         0
---  --------  --------  --------  --------  --------    --------  --------  --------
TOT   1521.27     33.86    277.22      0.23   1209.24         691  12803760  12814046

MC#    Total      I/O                MsgQ      WaitQ      Buffers   kBytes
---  --------  --------            --------  --------    --------  --------
000    307.56     15.83              288.39      0.00         692   7774952
---  --------  --------            --------  --------    --------  --------
TOT    307.56     15.83              288.39      0.00         692   7774952


- we get one more column here on BM side: "Compr" - time spent compressing backup buffers. Now our media controller (MC 000) has less data to write ("kBytes" column: 7.7 GB, compared to 12 GB before) so spends less time doing the IO (writing backup image). What is our bottleneck here is the compression. Backup takes over twice as much time as before because BM 000 spends 271 seconds doing the compression. DB2 will not be able to to fully utilize CPU resources in such situation - all the data from this big tablespace will be compressed by one db2bm EDU. E.g if you would look at the stacks from db2sysc process during the backup, you will see only one thread (EDU) busy in compression, e.g:
   Thread 3 (Thread 0x7f4b153fe700 (LWP 13841)):
#0  0x00007f4b26e6f1c9 in Compress
#1  0x00007f4b2fe82838 in CompressWrapper
#2  0x00007f4b308351d3 in sqloInvokeInterruptibleFunction
#3  0x00007f4b2fe82967 in CompressCall
#4  0x00007f4b2fe735ef in sqlubManageCompressedObject
#5  0x00007f4b2fe717c7 in sqlubFinishObject
#6  0x00007f4b2fe6c62f in sqlubReadDMS
#7  0x00007f4b2fe6a925 in sqlubBMProcessData
#8  0x00007f4b2fe69f4d in sqlubbuf
#9  0x00007f4b2fbc2878 in sqleSubCoordProcessRequest
#10 0x00007f4b2d808b94 in sqeAgent
#11 0x00007f4b2f022227 in sqzEDUObj
#12 0x00007f4b2e7da8c3 in sqloEDUEntry
#13 0x00007f4b35bc3dc5 in start_thread
#14 0x00007f4b2942dced in clone

(gstack output on Linux)
and as the result, you'll see only one CPU busy on the OS level as well, e.g in the "r" column (run queue) of vmstat.

If we are really concerned with backup performance and we cannot afford storing uncompressed backup on disk (and e.g compressing it later using a multi-threaded tool), we can utilize Unix pipes:
https://www.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.ha.doc/doc/t0006202.html
to do it "in flight" and move the compression to the "media controller" side of backup process.

For example, we can create 2 named pipes:
$ mkfifo $HOME/backup_pipe0
$ mkfifo $HOME/backup_pipe1

and via those pipes compress the data using e.g gzip (or any other available tool that can read from a pipe):
$ gzip < $HOME/backup_pipe0 > backup_part_0.gz
$ gzip < $HOME/backup_pipe1 > backup_part_1.gz

(from 2 separate shells)

At this point two gzip processes are ready to read data from pipes, all we need to do is start the backup:
$ db2 backup db SAMPLE to $HOME/backup_pipe0, $HOME/backup_pipe1

- by specifying 2 outputs, we request DB2 to use two media controller EDUs. Looking at the stacks the same way as we did before, we'll see our BM EDU reading data from tablespace (and no longer doing the compression):

Thread 6 (Thread 0x7f4b16bfe700 (LWP 17000)):
#0  0x00007f4b35bcaa93 in pread64
#1  0x00007f4b2e7d12eb in sqloReadBlocks
#2  0x00007f4b3171e97c in sqlbReadBlocks
#3  0x00007f4b2b954d12 in sqlbDMSReadContainerData
#4  0x00007f4b2b953aa0 in sqlbDMSMapAndRead
#5  0x00007f4b2b952bc2 in sqlbDMSDirectRead
#6  0x00007f4b2b98a3aa in sqlbDirectRead
#7  0x00007f4b2f6f57de in sqlbDirectReadBlock
#8  0x00007f4b2fe6bec8 in sqlubReadDMS
#9  0x00007f4b2fe6a925 in sqlubBMProcessData
#10 0x00007f4b2fe69f4d in sqlubbuf
#11 0x00007f4b2fbc2878 in sqleSubCoordProcessRequest
#12 0x00007f4b2d808b94 in sqeAgent
#13 0x00007f4b2f022227 in sqzEDUObj
#14 0x00007f4b2e7da8c3 in sqloEDUEntry
#15 0x00007f4b35bc3dc5 in start_thread
#16 0x00007f4b2942dced in clone

but we'll also see 2 media controllers writing the data to our pipes:
Thread 5 (Thread 0x7f4b147fe700 (LWP 17001)):
#0  0x00007f4b35bca1cd in write
#1  0x00007f4b2e75d724 in sqloWriteNamedPipe
#2  0x00007f4b2fe272d7 in sqluWriteToFileDevice
#3  0x00007f4b2fe1f73f in sqluMCWriteToDevice
#4  0x00007f4b2fe24700 in sqluMCWriteBackupBufToDev
#5  0x00007f4b2fe254be in sqluMCProcessBufferAddrMsg
#6  0x00007f4b2fe236a0 in sqluMCContinueBackup
#7  0x00007f4b2fe22a72 in sqluMCProcessBackupStates
#8  0x00007f4b2fe221ad in sqluMCStartBackupMediaController
#9  0x00007f4b2e7da8c3 in sqloEDUEntry
#10 0x00007f4b35bc3dc5 in start_thread
#11 0x00007f4b2942dced in clone
Thread 4 (Thread 0x7f4b167fe700 (LWP 17002)):
#0  0x00007f4b35bca1cd in write
#1  0x00007f4b2e75d724 in sqloWriteNamedPipe
#2  0x00007f4b2fe272d7 in sqluWriteToFileDevice
#3  0x00007f4b2fe1f73f in sqluMCWriteToDevice
#4  0x00007f4b2fe24700 in sqluMCWriteBackupBufToDev
#5  0x00007f4b2fe254be in sqluMCProcessBufferAddrMsg
#6  0x00007f4b2fe236a0 in sqluMCContinueBackup
#7  0x00007f4b2fe22a72 in sqluMCProcessBackupStates
#8  0x00007f4b2fe221ad in sqluMCStartBackupMediaController
#9  0x00007f4b2e7da8c3 in sqloEDUEntry
#10 0x00007f4b35bc3dc5 in start_thread
#11 0x00007f4b2942dced in clone

In parallel, our gzip processes will be reading this data from pipe, compressing and storing it on disk.

Now backup statistics look as follows:
2016-06-24-09.18.25.310015+120 E62550E1658           LEVEL: Info
PID     : 13430                TID : 139960447395584 PROC : db2sysc 0
INSTANCE: db2v105              NODE : 000            DB   : SAMPLE
APPHDL  : 0-28                 APPID: *LOCAL.db2v105.160624071337
AUTHID  : DB2V105              HOSTNAME: myhost1
EDUID   : 84                   EDUNAME: db2agent (SAMPLE) 0
FUNCTION: DB2 UDB, database utilities, sqluxLogDataStats, probe:395
MESSAGE : Performance statistics
DATA #1 : String, 1148 bytes

Parallelism       = 5
Number of buffers = 5
Buffer size       = 16781312 (4097 4kB pages)
                                                                 
BM#    Total      I/O      MsgQ      WaitQ      Buffers   MBytes
---  --------  --------  --------  --------    --------  --------
000    286.60     33.04    251.35      1.35         764     12191
001    285.26      0.05      0.24    284.96           2        16
002    285.26      0.50      4.49    280.25          12       179
003    285.26      0.04      0.00    285.20           1         0
004    285.26      0.53      2.98    281.73           8       116
---  --------  --------  --------  --------    --------  --------
TOT   1427.66     34.18    259.08   1133.51         787     12503

MC#    Total      I/O      MsgQ      WaitQ      Buffers   MBytes
---  --------  --------  --------  --------    --------  --------
000    286.60    286.46      0.01      0.00         392      6257
001    286.70    286.46      0.00      0.10         397      6321
---  --------  --------  --------  --------    --------  --------

TOT    573.31    572.93      0.01      0.10         789     12579

There is some gain, backup now takes 286 [s] (compared to 307 [s] with COMPRESS) and the bottleneck is MC - media controller EDUs need 286 [s] do write the data to the pipe. Does that mean that pipes are slow? No really, it is slow because gzip reading from those pipes are likely busy doing the actual compression and time required for that is accounted as IO in the DB2 stats. The good side is also that backup compressed by gzip is smaller than one compressed by built-in compression library (5.5 GB vs 7.7 GB).
I've only used 2 pipes here, on my test machine I have 8 CPUs available, so perhaps we could try 8 pipes and 8 gzips working in parallel? For gzip you can also control the compression level:
$ man gzip
[...]
The default compression level is -6 (that is, biased towards high compression at expense of speed).
[...]

so we can try something faster and less CPU-demanding, e.g level -3:
$ mkfifo $HOME/backup_pipe0
$ mkfifo $HOME/backup_pipe1
$ mkfifo $HOME/backup_pipe2
$ mkfifo $HOME/backup_pipe3
$ mkfifo $HOME/backup_pipe4
$ mkfifo $HOME/backup_pipe5
$ mkfifo $HOME/backup_pipe6
$ mkfifo $HOME/backup_pipe7
$ gzip -3 < $HOME/backup_pipe0 > backup_part_0.8.gz
$ gzip -3 < $HOME/backup_pipe1 > backup_part_1.8.gz
$ gzip -3 < $HOME/backup_pipe2 > backup_part_2.8.gz
$ gzip -3 < $HOME/backup_pipe3 > backup_part_3.8.gz
$ gzip -3 < $HOME/backup_pipe4 > backup_part_4.8.gz
$ gzip -3 < $HOME/backup_pipe5 > backup_part_5.8.gz
$ gzip -3 < $HOME/backup_pipe6 > backup_part_6.8.gz
$ gzip -3 < $HOME/backup_pipe7 > backup_part_7.8.gz

(from 8 separate shells)

and run backup
$ db2 backup db SAMPLE to $HOME/backup_pipe0, $HOME/backup_pipe1, $HOME/backup_pipe2, $HOME/backup_pipe3, $HOME/backup_pipe4, $HOME/backup_pipe5, $HOME/backup_pipe6, $HOME/backup_pipe7

In vmstat we'll see at least 8 processes in the run queue and over 90% of CPUs utilized:
$  vmstat 1 10
procs ------- ----io----    -system-    ------cpu-----
 r  b   swpd       bi    bo    in    cs us sy id wa st
11  0      0   216064   356 11012 21843 92  8  0  0  0
 8  0      0   183776    28 10419 18096 93  7  0  0  0
10  1      0   170720    44 10374 20390 92  7  0  0  0

and results are much better:

2016-06-24-09.39.47.477753+120 E86255E2055           LEVEL: Info
PID     : 13430                TID : 139960489338624 PROC : db2sysc 0
INSTANCE: db2v105              NODE : 000            DB   : SAMPLE
APPHDL  : 0-41                 APPID: *LOCAL.db2v105.160624073831
AUTHID  : DB2V105              HOSTNAME: myhost1
EDUID   : 166                  EDUNAME: db2agent (SAMPLE) 0
FUNCTION: DB2 UDB, database utilities, sqluxLogDataStats, probe:395
MESSAGE : Performance statistics
DATA #1 : String, 1545 bytes

Parallelism       = 5
Number of buffers = 16
Buffer size       = 10489856 (2561 4kB pages)
                                                                 
BM#    Total      I/O      MsgQ      WaitQ      Buffers   MBytes
---  --------  --------  --------  --------    --------  --------
000     73.30      1.83      0.23     71.22          14       116
001     72.80     33.49     38.51      0.01        1220     12191
002     72.80      0.72      0.22     71.84          18       179
003     72.80      0.04      0.00     72.75           1         0
004     72.80      0.31      0.00     72.48           2        16
---  --------  --------  --------  --------    --------  --------
TOT    364.53     36.40     38.97    288.33        1255     12503

MC#    Total      I/O      MsgQ      WaitQ      Buffers   MBytes
---  --------  --------  --------  --------    --------  --------
000     73.16     72.44      0.68      0.00         169      1680
001     73.27     72.42      0.77      0.05         157      1560
002     73.27     72.38      0.80      0.05         156      1550
003     73.17     72.36      0.73      0.05         164      1630
004     73.30     72.54      0.67      0.05         161      1600
005     73.16     72.32      0.75      0.05         161      1600
006     73.13     72.36      0.68      0.05         158      1570
007     75.18     74.35      0.75      0.05         137      1350
---  --------  --------  --------  --------    --------  --------
TOT    587.68    581.19      5.88      0.36        1263     12544


Now backup takes just 75 [s] and it is actually a bit faster than one without compression! (we could speed up not compressed one by using multiple MCs too). Backup is also still smaller than one with COMPRESS (~ 6 GB).

What are the downsides?
1. It adds complexity to the backup process

2. You no longer can easily use db2ckbkp to verify the consistency of the backup

3. Sometimes you don't really want backup to utilize so much CPU, so keeping compression within DB2 is fine (have you heard of hardware accelerated backup compression available on DB2 11.1 running on Power? http://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.admin.ha.doc/doc/c0062003.html)

but might be worth testing if you would happen to have a database with similar tablespace design.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm11140082