Comparison of snapshot based backups and backups from live system
Backing up large file systems can take many hours or even days. When using the IBM Storage Scale command mmbackup, time is needed for the following steps.
Scanning the system to identify the objects that need to be backed up or expired.
Expiring all objects removed from the system.
Backing up all new or changed objects.
If the backup is run on the live file system while it is active, objects selected for the backup job can be backed up at different points in time. This can lead to issues when temporary or transient files that were present during the scan time are removed by the time the backup command tries to send them to the IBM Storage Protect server. The attempt to back up a file that is removed fails and the need to back up this object is still recorded in the shadow database.
Instead of backing up from a live file system, an alternative is to use snapshot based backups. Using the snapshot adds the additional actions of creating or reusing a snapshot and removing it when the overall backup process completes. However, this approach provides several advantages because a snapshot is a point in time view of the file system that is read only and it can be used for backing up the file system for as long as is necessary to complete the backup job. The advantages of following this approach are as follows:
Transient or temporary files are backed up, provided they existed at the time the snapshot was taken.
Protection against failures to back up due to server-side faults such as the IBM Storage Protect server running out of space. For example, if the database or storage pool becomes full or if the IBM Storage Protect server crashes, etc. In this case, a retry of the backup is possible for the point in time when the snapshot has been taken with no loss of function or backup.
Retention of a backup within the system. Snapshots can be kept for a period of time providing an online backup copy of all files. This can protect against accidental deletions or modifications, and can be used to retrieve an earlier version of a file, etc.
A means to fulfill a data protection policy even if the backup activity to IBM Storage Protect exceeds the nominal time window allotted. The snapshot can be kept for several days until backups are complete, and multiple snapshots can be kept until backup completes for each of them.
IBM Storage Scale provides the capability to create snapshots for a complete file system, known as global snapshots, and for independent filesets, known as fileset snapshots.
While snapshot based backups provide several advantages, the following considerations apply when using the snapshot capability:
Snapshots might consume space; usually a snapshot used for backup is removed shortly after the backup operation finishes. Long lived snapshots retain their copy of the data blocks, taken from the file system's pool of free blocks. As they age, the snapshots consume more data blocks owing to the changes made in the read or write view of the file system.
Snapshot deletion can take time depending on the number of changes that need to be handled while removing the snapshot. In general, the older a snapshot is, the more work it will require to delete it.
Special consideration for use of IBM Storage Protect for Space Management:
- When a migrated file stub that is already part of a snapshot is removed, a recall is initiated to keep the snapshot consistent. This is required because removal of the stub invalidates the offline references to the stored data. The recall is to fill blocks on disk and assign them to the snapshot view. Once the stub is removed from the file system and a reconcile process removes this file from the Space Management pool on the IBM Storage Protect server, there are no longer any references to the file data except the snapshot copy.