Quick overview of what usually to look at when an Informix dbspace backup is taking longer than usual or what would seem appropriate for a given data size.
Your level 0 database server backup is taking longer than usual, or you think it shouldn't be taking so long given the amount of data it ultimately backs up.
Causes can be at various stages of the processes involved in a L 0 backup.
These stages roughly are:
- storage where the data resides within the database server (disks, SAN, ...), including OS I/O sub-system
- database server that sends this data to the backup client process
- backup client process (ontape, onbar)
- backup target storage (tape, disk, file system or, in case of onbar, the storage manager)
From database server perspective the backup process largely is exactly the same whether using ontape or onbar for backing up your data, so where this makes a difference really only is with the backup destination (directly to tape/disk/pipe vs. to a storage manger of your choice (which also can have its set of performance problems.) One more difference can be, in case of onbar parallel backup, that there are multiple backup sessions (for as many spaces) running at the same time.
Most common cause for this kind of problems is slow disk I/O and contention on database server side.
A database server backup is disk I/O intense in the sense that large amounts of data (not currently in cache) has to be read from disk, in large sequential chunks. This usually only poses a problem if this backup disk I/O has to compete with a lot of other disk I/O, or if disk I/O is generally slow from the beginning.
Backing up an Informix database server using ontape or onbar utility.
Diagnosing The Problem
The task at hand is finding out where in this process the waiting starts. First thing to determine usually is whether the main reason for a given slowness is more towards one or the other end of said data flow.
For this you first want to know is the problem more on backup client and data destination or on database server and data origin side, that is is the backup client waiting on the database server to provide the data, or is the database server waiting on the backup client to process the data.
The command for this question is 'onstat -g stq' which provides insight into the usage of the Stream Queue Buffers used for the data exchange between database server and backup client (ontape or onbar). What it tells you, with three lines of output for each backup session,
- Stream Queue: amount of buffers (and their memory addresses) for a give backup session
- Full Queue: those buffers that currently are full, i.e. waiting on consumption - by the backup front end in case of a backup
- Empty Queue: buffers waiting to be filled - by the database server in case of a backup
When monitoring multiple of these outputs over a period of time, e.g. using 'onstat -g stq -r 1', you might either see constantly more empty buffers, pointing to a database server side problem, or, conversely, constantly more full buffers, pointing to a problem in getting rid of the data towards backup destination.
Once a clear tendency could be determined here, you can go looking further, on database server side (and below), or on ontape/onbar, target device or storage manager side (which more likely all are outside of Informix control.)
Here's what to look at on database server side:
- backup threads' states and stacks
use 'onstat -g ath' and 'onstat -g stk <thread_id>' for this.
You might see only backup threads (arcbackup*) suffering, from whatever reason, or you might see them competing for some resource with other database server activity.
- in case of signs for I/O waiting: 'onstat -g iof' or, where available (v.12.10 onwards), 'onstat -g ioh' and sysadmin:mon_iohistory table,
showing individual chunk device i/o and performance, summed up and on average,
or as histogram info over last 60 minutes, collected for days by spec. monitor.
- If there are general, i.e. not only backup related signs for disk I/O problems, these should be solved first and independently from the backup performance problem. Reasons again can be numerous, from poor query plans or other reasons causing undesired high amounts of disk I/O, to slow disk subsystem, poor storage configuration (from Informix level down to disk device setup) and a combination of these.
- In case of other problems, continue investigation accordingly.
- other onstats of interest during an archive:
onstat -g arc showing progress of archive
onstat -D showing you which chunks are seeing which amounts of disk I/O
Should the problem appear to be more on backup client side, and if using onbar, BAR_PERFORMANCE and BAR_DEBUG (not higher than 5 for the beginning), can be of help.
For ontape you'd be left with OS diagnostics and ontape process stacks mostly.
16 June 2018