How to identify backup and migration candidates

When using IBM Storage Protect for backing up IBM Storage Scale file systems, there are several choices for the method used to identify backup and migration candidates.

Limitations when using IBM Storage Protect Backup-Archive client to identify backup candidates

Using IBM Storage Protect Backup-Archive client to traverse IBM Storage Scale file systems to identify backup candidates does not scale well. For this reason, using the IBM Storage Scale mmapplypolicy engine is preferable because it is much faster to scan the file system for identifying backup candidates than traversing the file system.

Therefore, for processing backups on larger file systems, use the IBM Storage Scale command mmbackup instead of using the IBM Storage Protect Backup-Archive client commands such as dsmc expire or dsmc selective or dsmc incremental directly. Using the mmbackup command also provides the following benefits:

  • Backup activities are run in parallel by using multiple IBM Storage Scale cluster nodes to send backup data in parallel to the IBM Storage Protect server.
  • mmbackup creates a local shadow of the IBM Storage Protect database in the file system and uses it along with the policy engine to identify candidate files for backup. The IBM Storage Protect server does not need to be queried for this information saving time when calculating the backup candidate list.
    • mmbackup and its use of the policy engine can select candidates faster than the dsmc progressive incremental operation that is bounded by walk of the file system using the POSIX directory and file status reading functions.
    • Using dsmc selective with lists generated by mmbackup is also faster than using dsmc incremental even with similar lists generated by mmbackup.
    Note: It is recommended that scheduled backups of an IBM Storage Scale file system use mmbackup because mmbackup does not actively query the IBM Storage Protect server to calculate backup candidates. However, events such as file space deletion or file deletion executed on IBM Storage Protect server are not recognized until the user triggers a synchronization between the mmbackup shadow database and the IBM Storage Protect database.

The following table contains a detailed comparison of mmbackup and IBM Storage Protect Backup-Archive client backup commands:

Table 1. Comparison of mmbackup and IBM Storage Protect Backup-Archive client backup commands
  IBM Storage Scale policy-driven backup (mmbackup) IBM Storage Protect progressive incremental backup (dsmc incremental)
Detects changes in files and sends a new copy of the file to the server. Yes Yes
Detects changes in metadata and updates the file metadata on the server or sends a new copy of the file to the server (for ACL/EA changes). Yes Yes
Detects directory move, copy, or rename functions, and sends a new copy of the file to the server. Yes Yes
Detects local file deletion and expires the file on the server. Yes Yes
Detects IBM Storage Protect file space deletion or node/policy changes, and sends a new copy of the file to the server. No* Yes
Detects file deletion from the IBM Storage Protect server and sends a new copy of the file to the server. No* Yes
Detects additions of new exclude rules and expires the file on the server. Yes Yes
Detects policy changes made to include rules and rebinds the file to the new storage pool. No** Yes
Detects copy mode and copy frequency configuration changes. No* Yes
Detects migration state changes (IBM Storage Protect for Space Management) and updates the server object. Yes Yes
Detects that a file wasn't processed successfully during a backup operation and attempts again at the next backup. Yes Yes
Supports IBM Storage Protect Virtual Mount Points to divide a file system into smaller segments to reduce database size and contention. No Yes
* The mmbackup command queries the IBM Storage Protect server only once at the time of the first backup. Changes that are performed on the IBM Storage Protect server by using the IBM Storage Protect administrative client cannot be detected by mmbackup processing. You must rebuild the mmbackup shadow database if the IBM Storage Protect server file space changes.
** IBM Storage Protect includes rules with associated management class bindings that cannot be detected by mmbackup processing. Therefore, mmbackup processing does not rebind a file if a management class changes include rules.

If you use IBM Storage Protect Backup-Archive client backup commands on file systems that are otherwise handled by using mmbackup, the shadow database maintained by mmbackup loses its synchronization with the IBM Storage Protect inventory. In such cases, you need to resynchronize with the IBM Storage Protect server which will inform mmbackup of the recent backup activities conducted with the dsmc command. Resynchronization might be a very time-consuming activity for large file systems with a high number of backed up items. To avoid these scenarios, use the mmbackup command only.

If you have used dsmc selective or dsmc incremental since starting to use mmbackup and need to manually trigger a synchronization between the mmbackup maintained shadow database and the IBM Storage Protect server:

  • Use the mmbackup --rebuild if you need to do a synchronization only.
  • Use the mmbackup -q if you need to do a synchronization followed by a backup of the corresponding file system.

Using the IBM Storage Protect for Space Management client to identify migration candidates

Using IBM Storage Protect for Space Management clients for traversing IBM Storage Scale file system to identify migration candidates does not scale well. The IBM Storage Protect automigration daemons consume space in the file system and also consume CPU resources. They do not have access to the internal structures of the file system in the way that the IBM Storage Scale mmapplypolicy command does, and so they cannot scale. Use the following steps instead:

  1. Set the following environment variable before the installation of the IBM Storage Protect for Space Management client to prevent the automigration daemons from starting during the installation:

    export HSMINSTALLMODE=SCOUTFREE

    It is recommended to place this setting into the profile file of the root user.

  2. Add the following option to your IBM Storage Protect client configuration file, dsm.opt, to prevent the automigration daemons from starting after every system reboot:

    HSMDISABLEAUTOMIGDAEMONS YES

  3. Add the following option to your IBM Storage Protect client configuration file, dsm.opt, to ensure that the object ID is added to the inode, so that the file list based reconciliation (two-way-orphan-check) can be used:

    HSMEXTOBjidattr YES

  4. While the automigration daemons are disabled, changes such as removal of migrated files are not automatically propagated to the IBM Storage Protect server. For housekeeping purposes, you must run the IBM Storage Protect reconciliation either manually or in a scheduled manner. For more information, see Reconciling by using a GPFS policy in IBM Storage Protect for Space Management documentation.