IBM Support

Backup and Restore Performance Improvements with Large Numbers of Tables, Views, and Procedures

News


Abstract

IBM Pure Data for Analytics 7.2.1.6-P2 includes enhancements that improve performance when the backed up and restored database has many tables or other objects that rarely change. Both incremental backup and incremental restore include optimizations that eliminate unnecessary processing overhead for tables and other objects (specifically, views and procedures) that were not modified. Also, you can designate "staging" or other temporary-use tables as not needing backup, and both full and incremental backups would eliminate unnecessary processing overhead for these tables.

Content

Tracking Changes to Tables and Views

To support the incremental backup and restore optimizations for tables and views (including materialized views) that were not modified, the CREATE DATABASE and ALTER DATABASE commands now accept an extra TRACK CHANGES option:

CREATE DATABASE <name> [TRACK CHANGES [ON | OFF]]
ALTER DATABASE <name> [TRACK CHANGES [ON | OFF]]

By default, a database is created with TRACK CHANGES OFF, which means that the incremental backup and restore optimizations for tables and other objects are not in effect for the specific database. This behavior matches the existing behavior of backup and restore for versions 7.2.1.5-P2 and earlier. TRACK CHANGES OFF is also the default setting on each database after an upgrade to release 7.2.1.6-P2 or later. TRACK CHANGES must be ON in a database to enable the optimizations for tables and views in that database.

With TRACK CHANGES ON, incremental backup of the specific database does not attempt to retrieve inserted and or deleted rows for a specific table in the following situations.

- When it is known that no rows were inserted ince the last backup.

- When it is known that no rows deletedm since the last backup.

- When it is known that no rows were updated since the last backup.

The "last" backup is the initial full backup for a cumulative backup and the preceding backup, full or incremental, for a differential backup. Insertions and deletions (an update operation counts as an insertion and a deletion) are tracked separately for this optimization. No empty (zero-row) "inserted rows" and or "deleted rows" files are generated by incremental backup for incremental restore to process. The cost savings on both backup and restore can be significant when a large fraction of the tables in the database is unchanged between backups.

Incremental Backup and Restore of Views

With TRACK CHANGES ON, incremental backup of the specific database does not back up a view (or materialized view) definition that was not changed. Incremental restore takes no action for such a view because the existing definition in the target database is up to date.

Incremental Restore of Procedures

An extra optimization in nzrestore now drops and re-creates only those stored procedures that were changed in the source database since the last backup.

Avoiding Backup and Restore of Selected Tables

To avoid backing up and restoring a specific table, the new BACKUP option on CREATE TABLE and ALTER TABLE can be used:

CREATE TABLE ... [BACKUP [DATA] [ON | OFF]]
ALTER TABLE ... [BACKUP [DATA] [ON | OFF]]

By default (and on upgrade), a nontemporary user table is created with BACKUP ON, which matches the existing backup and restore behavior. BACKUP ON and BACKUP DATA ON are semantically equivalent. The other BACKUP choices are:

· With BACKUP DATA OFF, the data for the specific table is not backed up and restored. This statement applies to both full and incremental backups. The table definition is backed up, and the restore creates a table with the specific definition but no data.

CAUTION: Queries against such a table in the target database cannot be expected to give meaningful results, since the data from the source table was not restored.

· With BACKUP OFF, the table definition, and the table data are not backed up.

CAUTION: nzrestore displays nonfatal errors when you are attempting to create dependent objects. For example, any views or synonyms dependent on a BACKUP OFF table is not created. You can ignore these errors.

Back up and Restore Visible Changes

· Incremental backup (differential or cumulative) does not report "Backing up <name>" for a table that was not modified since the last backup, when TRACK CHANGES is ON for the database.

· Back up (full or incremental) does not report "Backing up <name>" for a table that has BACKUP OFF or BACKUP DATA OFF.

· When you are backing up a BACKUP DATA OFF table, the backup reports a warning: “Forcing empty backup of table <name> as it is excluded from backup.” The restore updates the definition and data from earlier restore in the data is not overwritten.

· For a table with BACKUP OFF, restore of views or synonyms that reference the table fails (with a nonfatal error message).

· When a table is Altered from BACKUP [DATA] OFF to BACKUP [DATA] ON, the next incremental backup issues the following warning message. "Forcing full backup of table '<name>', which was groomed, truncated, or added to back up since last backup." This expanded warning message also appears when a table was groomed or truncated between backups.

· When a database is Altered from TRACK CHANGES OFF to TRACK CHANGES ON, the next incremental backup queries all BACKUP ON tables for inserted and deleted rows (same as the existing incremental backup).

· The new TRACK CHANGES and BACKUP settings, for databases and tables is backed up and restored.

· The backup and restore changes are compatible with both previous and later versions. If an older release restores from a newer backup, some nonfatal errors might appear during the restore. The errors do not prevent the restore transaction from committing. "Restore of increment <number> from backupset <backupset> to database '<db_name>' committed."

System View Changes

· A new column 'dbtrackchanges' (value 0 when OFF, 1 when ON) is added to the view _v_database.

· A new column 'dontbackup' (0 = BACKUP [DATA] ON, 1 = BACKUP DATA OFF, 2 = BACKUP OFF) is added to _v_table, _v_table_storage_stat, _v_table_only_storage_stat, _v_obj_relation, _v_object_data, and _v_objects.

Nzsql Changes

· \d <table> shows ‘Backup Data Off’ or ‘Backup Off’ when that is the case. The latter is always the case for a temporary table. No extra text is displayed when the table has BACKUP [DATA] ON (the default).

· \l+ (list databases with extra detail) shows an extra TRACKCHANGES column. The value displayed is blank if TRACK CHANGES is off (the default), or ON if on.

Netezza Replication Visible Changes

· Replication does not treat tables with BACKUP DATA OFF or BACKUP OFF specially; these tables are always backed up (definition and data) by nzreplbackup and restored by nzreplrestore. (Do not use nzbackup or nzrestore directly to initialize or recover a replication node.)

· The new SQL syntax is replicated when you use it with a replicated database on a master node. (TRACK CHANGES on CREATE/ALTER DATABASE, and BACKUP on CREATE/ALTER TABLE).

CAUTION: Do not use the new SQL syntax (TRACK CHANGES on CREATE/ALTER DATABASE, and BACKUP on CREATE/ALTER TABLE) on replicated databases unless master and subordinate or subordinates are both (all) on NPS 7.2.1.6-P2 or later. If a subordinate node is on any version before 7.2.1.6-P2 and you use any of these constructs on the master, the subordinate suspends. Master and subordinate releases that are different to 7.2.1.6-P2 are recommended against.

· \l+ (list all databases with extra detail) always shows REPLICATION_SET (as well as TRACKCHANGES), whether or not any databases are replicated. The special replicated "database" '_GLOBAL', which is not a real database but, which appears in views such as _v_replicated_database and _v_replication_state, does not appear in the output of \l+ (as before).

Performance

Back up and Restore Performance Improvement

The following measurements were observed on single-rack Netezza "Mako" and "Striper" systems with about 147GB of data in about 5000 tables distributed across 240 data slices. Elapsed times are in seconds, averaged across three runs. The elapsed times for the initial full backup, and for the initial restore of the full backup, are not shown. The tables that were modified between the full and the differential backup have a significant impact on the performance benefit. That benefit comes from avoiding queries to back up and restore nonexistent inserted and deleted rows for tables that did not change between backups. For larger absolute amounts of data changed between backups (ore than the tested 10% of 147GB), the relative performance speedup might be less than shown. The backing up and restoring of the data that did change might dominate the shorter queries (that back up and restore no rows) that are avoided. 

Mako

Differential Backup
Restore of one Increment


Elapsed time with TRACK CHANGES OFF
Elapsed time with TRACK CHANGES ON
Speedup
Elapsed time with TRACK CHANGES OFF on source
Elapsed time with TRACK CHANGES ON on source
Speedup

~10% of data changed across 10% of tables
446
96
4.65x
1108
171
6.48x

~10% of data changed across 25% of tables
435
149
2.92x
1089
409
2.66x

1 row changed in each table*
463
348
1.33x
1378
1053
1.31x
Striper

Differential Backup
Restore of one Increment


Elapsed time with TRACK CHANGES OFF
Elapsed time with TRACK CHANGES ON
Speedup
Elapsed time with TRACK CHANGES OFF on source
Elapsed time with TRACK CHANGES ON on source
Speedup

~10% of data changed across 10% of tables
474
112
4.23x
1084
172
6.30x

~10% of data changed across 25% of tables
466
134
3.48x
1153
395
2.92x

1 row changed in each table*
478
362
1.32x
1671
1135
1.39x

*For the "1 row changed in each table" test, roughly one-third each of the changes were Inserts, Deletes, and Updates. A table that had a row or rows updated, requires two queries to retrieve inserted and deleted rows during incremental backup and two queries during restore. This behavior is similar to previous releases or with TRACK CHANGES OFF, so these tables do not see any benefit on backup or on restore.

A table that had only inserted or only deleted rows between backups requires the following.

- One query during incremental back up to retrieve the inserted or one or more deleted rows. 

- Only one query during restore to insert or delete one or more rows, which results in the modest speedup in both backup and restore in this case.

 

Update and Delete Tracking Overhead

When a table in a TRACK CHANGES ON database has BACKUP [DATA] ON, Update, and Delete statements incur some extra host I/O to record tracking information in the system catalog. No difference in the performance of Update and Delete statements (or Merge statements that Update or Delete rows) was noted as a result of this change. Insert statements do not incur this overhead, as the tracking in this case is combined with some existing I/O that is already being performed.

Caveats and limitations

· Because BACKUP OFF does not (back up and) restore a table's definition (nor the table's data), dependent views and synonyms are not be restored.

· For Alter Table Add/Drop Column and Groom Table Versions, the tracking of insertions and deletions is not yet as precise as it could be. The first incremental backup after such an operation might unnecessarily query for inserted and or deleted rows. And if no rows were inserted or deleted, get back no rows in response. Correctness is not affected. If rows are inserted and or deleted after such an operation, incremental backup retrieves the inserted and or deleted rows.

· The new SQL syntax (TRACK CHANGES on CREATE/ALTER DATABASE, and BACKUP on CREATE/ALTER TABLE) is not be used on replicated databases unless master and one or more subordinate are both (all) on NPS 7.2.1.6-P2 or later. If a subordinate node is on any version before 7.2.1.6-P2, use of these constructs on the master causes the subordinate to suspend. You are recommended against using master and subordinate releases different to 7.2.1.6-P2.

[{"Product":{"code":"SSULQD","label":"IBM PureData System"},"Business Unit":{"code":"BU053","label":"Cloud \u0026 Data Platform"},"Component":"--","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"1.0.0","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Product Synonym

NPS Netezza Performance Server PureData for Analytics PDA

Document Information

More support for:
IBM PureData System

Software version:
1.0.0

Document number:
569623

Modified date:
29 September 2022

UID

swg22015546