Backup and restore best practices

Deployment options: Netezza Performance Server for Cloud Pak for Data System Netezza Performance Server for Cloud Pak for DataIBM Netezza Performance Server for Cloud Pak for Data as a Service

Learn about backup and restore best practices, such as improving nzrestore performance during incremental restore.

While you are applying a high volume of deleted rows during an incremental restore, you might encounter problems related to slow nzrestore performance or with a full SPU swap partition.

When you are running an incremental restore and it is taking a long time to delete records as a part of the restore operation, you might notice the following messages:

The message in the pg.log file from the restore operation is similar to the following message:

2017-05-24 16:40:36.380386 CES [10548]  DEBUG:  QUERY: delete from ADMIN.T1 where (rowid) in (select rid from external '/nz/tmp/bnr10546F1sQFj' (rid bigint not null, dsid smallint not null) using (logdir '/dev/null'));

2017-05-24 16:40:36.394734 CES [10548]  DEBUG:  spawnLdrReader: pid: 18805 load opt: 0 format = 0

The nzrestore command fails and displays this error message:

Error: NZ-01103/01104: DB SQL 'delete from ADMIN.T1 where (rowid) in (select rid from external '/nz/tmp/bnr10546F1sQFj' (rid bigint not null, dsid smallint not null) using (logdir '/dev/null'));' failed - DB error - ERROR:  SPU swap partition : Disk temporary work space is full.

Resolving the problem

A new restore logic is introduced with 11.2.2.4. The enableSpliDelete option is enabled by default.

To disable the restore logic, run the following command:

nzrestore -disableSplitDelete

This feature splits the single delete statement into several delete operations in the same transaction to avoid a disk-based hash join process (DHJ) and memory and performance problems.

To use this feature as a workaround to this problem, ensure that you fulfill the following conditions:

The number of rows in the base table is greater than default value or exceeds the value that is specified with nzrestore -baseTableRowCount.
Note: On Netezza Performance Server release 11.2.1.12 and higher, default base table row count (-baseTableRowCount) is 1 billion whereas in all earlier Netezza Performance Server releases the base table row count is 10 billion.
The number of delete records is beyond the DHJ threshold or the size of the .del file exceeds nzrestore -delFileSize.

Note: The base table is not a version of a row-secure table.

nzrestore examples

nzrestore -enableSplitDelete examples:

enableSplitDelete is set by default. If the size of the .del file is greater than 15 GB and the base table row count is greater than 1 billion (default value), run:
```
nzrestore -delFileSize 15
```
enableSplitDelete is set by default. If the size of the .del file is greater than 15 GB and the base table row count is greater than 100 million, run:
```
nzrestore -delFileSize 15 -baseTableRowCount 100
```
To disable enableSplitDelete with all default settings, run:
```
nzrestore -disableSplitDelete
```

Finding the `.del` file that is associated with a particular table

The following error appears in restoresvr.log during nzrestore.

Error: NZ-01103/01104: DB SQL 'delete from ADMIN.T1 where (rowid) in (select rid from external '/nz/tmp/bnr10546F1sQFj' (rid bigint not null, dsid smallint not null) using (logdir '/dev/null'));' failed - DB error - ERROR: SPU swap partition : Disk temporary work space is full.

In this example, the error is generated while you are deleting records from table T1.

Search for Reading schema from in restoresvr.log to find out the correct schema.xml file.

Search for the table (T1) in the corresponding schema.xml file and note the object id (old) for the table (T1).

$ grep -Ei 'TABLE.*T1' /nzscratch/backup/Netezza/7836028-H2/SAN/20170526144618/2/DIFF/md/schema.xml
<TABLE ver="2" name="T1" schema="ADMIN" owner="ADMIN" oidhi="0" oid="886046" delimited="f" sdelim="f" odelim="f" rowsecurity="f" origoidhi="0" origoid="886046">

Go to the data directory that is associated with restore.
Find the .del file for the object id (oid) (oid 886046).
In this example, the .del files are in the /nzscratch/backup/Netezza/7836028-H2/SAN/20170526144618/2/DIFF/data directory.
```
$ ls -l /nzscratch/sanjitc/crm118216/backup/Netezza/7836028-H2/SAN/20170526144618/2/DIFF/data/886046.del.*

-rw------- 1 nz nz 215274049844 May 26 10:47 886046.del.1.1
```

Determining the DHJ threshold

With the default registry settings, the DHJ threshold is around 12,511,836 number of rows. In other cases, you can identify the threshold in the following way.

'number of deleted rows' x 'number of dataslices on the system'  < '50% of spuPlanWorkBlocks'

spuPlanWorkBlocks specifies the system registry. Its default value is 256 MB.

Alternative approaches

If you are using version older than 11.2.2.4, or looking for an alternative solution, you can try one of the following solutions:

Take a full backup instead of a differential backup when the rows that need to be deleted exceed the DHJ threshold.
Run the nz_update_backup_status command to force a full table backup on the next differential backup run. Follow the instructions that are described in Forcing a full table backup with the nz_update_backup_status command.