NDMP backup example scenario
The following example scenario describes how to create NDMP backup groups using a Data Management Application (DMA) and assign an expiration interval to the backup groups to facilitate management of file system backup snapshots. Configuration considerations and the prefetch function are also described.
Starting a backup, configuring backup policies, and most NDMP backup monitoring occurs on the DMA. Before creating backup policies using the DMA software, ensure that the DMA has been configured to work with the NDMP configuration on the Storwize V7000 Unified system.
The backup group name is associated with a set of directories and a backup group expiration such that all of the directories in the backup group of that name share the same GPFS snapshot. You can create multiple separate backup group names for the same set of directories.
- Where the backed up data is stored. If physical tape drives are used, the number of simultaneous backup streams is normally limited to the number of physical tape drives available, unless the DMA software can multiplex data streams onto a single tape drive.
- NDMP must be configured and activated on the Storwize V7000 Unified system.
- NDMP data servers running on the Storwize V7000 Unified file modules must
be registered with the DMA software using the NDMP user name and password
of the Storwize V7000 Unified system
NDMP configuration. There are usually two methods for registering
NDMP clients on the DMA:
- Registration of public IP addresses corresponding to the file modules in
an NDMP file module group.
Using this method, you explicitly specify which IP addresses, and
therefore which file modules,
are used for a backup policy. This method allows more control because
the backup load is distributed manually. Note: If data must be restored, this normally requires restoring the data from the backup image corresponding to the client that backed up that data. In this case, the backup is registered by IP address, and you must determine which data is backed up from which client to restore the data from the correct location in the DMA.
- Registration of a DNS name. Using this method, the DNS chooses which backup requests are service by which file modules based on round robin IP address assignment of the public IP addresses linked to the file modules in the NDMP file module group. One advantage of performing backups using this method is that typically the DMA stores all backup data under the single DNS entry that was registered as the NDMP client. In this case, if data must be restored there is no need to specify which exact file module or IP address was used to back up that data.
- Registration of public IP addresses corresponding to the file modules in
an NDMP file module group.
Using this method, you explicitly specify which IP addresses, and
therefore which file modules,
are used for a backup policy. This method allows more control because
the backup load is distributed manually.
- The health of the file modules in the NDMP file module group. Just before you start an NDMP backup or restore operation, you might want to ensure that none of the file modules in the NDMP file module group have failed over. If a file module that had failed over becomes healthy while the NDMP operation is in progress, some of the public IP addresses that are used for the NDMP operation might be moved to a different file module, which prematurely stops the NDMP operation.
- Performance implications when creating a backup on the Storwize V7000 Unified system. NDMP backups use file system level snapshots. If a new snapshot is created for every NDMP backup, it might significantly impact the speed of snapshot creation and other operations on that file system. Backup groups and backup group expirations can be configured to minimize performance impact. Backup groups allow parallel NDMP sessions to be connected through multiple file modules using the same file system level snapshot for all of the NDMP backups. A backup group expiration specifies how long the file system level snapshot for a given backup group is to be used. Because a snapshot represents the file system at a given point in time, a backup of the data can becomes "stale" as changes to the backed up data occur after the snapshot has been created. Therefore, you might want to create multiple backup groups for the same file system and specify different expiration values; for example, specify longer expiration values for a backup group that performs full backups and shorter expiration values for a backup group that performs incremental backups of the same file system.
- Performance implications when deleting an NDMP snapshot. NDMP
uses GPFS file system level snapshots for both full and incremental
backups. Depending on the size and complexity of the file system,
automated deletion of NDMP snapshots might take more than one hour
in some cases. Using snapshot groups that allow multiple NDMP sessions
to use the same snapshot can reduce the overall number of snapshots
that must be taken, and later deleted. Delays due to file system
level NDMP snapshot deletion can occur at the following times:
- When NDMP is activated, or reactivated after it has been deactivated, all NDMP related snapshots and temporary files are automatically deleted. The number of files that have changed since a snapshot has been taken can be used to estimate how long it might take to delete the snapshot. Allow sufficient time after activating NDMP to complete all snapshot deletion processes before a backup starts to ensure that the snapshot deletion process does not extend beyond the NDMP backup start time.
- At the end of an NDMP backup, NDMP determines whether there are any NDMP related expired snapshots that are not referenced by other NDMP sessions. If any exist, they are deleted. Also, the snapshot referenced by the backup that just ended is checked to determine whether it has expired and whether there are other NDMP sessions referencing it. If there are no sessions accessing that snapshot, and it has expired, it is also deleted.
There are specific NDMP environment variables that are not the same as Linux environment variables; they are treated as parameters that are passed to the NDMP server from the DMA that is performing the NDMP backup. The two parameters that can be added are BACKUP_GROUP_NAME and BACKUP_GROUP_EXPIRATION.
SET BACKUP_GROUP_NAME = NDMP_EXISTING_snap1
A
BACKUP_GROUP_NAME that begins with NDMP_EXISTING specifies
to use a file system level snapshot that has a name corresponding
to this environment variable value. The corresponding snapshot must
be manually deleted and can only have a file system level scope; file
set level snapshots of this type are not supported. The maximum length
of the backup group name is 32 characters, including the 13 characters
for NDMP_EXISTING if used.SET BACKUP_GROUP_EXPIRATION = 1_DAY
The
following ranges are acceptable values for this variable:- 0_SEC - 60_SEC
- 0_MIN - 60_MIN
- 0_HOUR - 168_HOUR
- 0_DAY - 31_DAY
- 0_MONTH - 12_MONTH
Backup Policy: #1
SET BACKUP_GROUP_NAME = full_backup
SET BACKUP_GROUP_EXPIRATION = 23_HOUR
/ibm/gpfs0/shared/home/ndmp1
/ibm/gpfs0/shared/home/ndmp2
/ibm/gpfs0/shared/home/ndmp3
/ibm/gpfs0/shared/home/ndmp4
/ibm/gpfs0/shared/home/ndmp5
Backup Policy: #2
SET BACKUP_GROUP_NAME = full_backup
SET BACKUP_GROUP_EXPIRATION = 23_HOUR
/ibm/gpfs0/shared/home/ndmp6
/ibm/gpfs0/shared/home/ndmp7
/ibm/gpfs0/shared/home/ndmp8
/ibm/gpfs0/shared/home/ndmp9
/ibm/gpfs0/shared/home/ndmp10
Backup Policy: #3
SET BACKUP_GROUP_NAME = incr_backup
SET BACKUP_GROUP_EXPIRATION = 5_HOUR
/ibm/gpfs0/shared/home/ndmp1
/ibm/gpfs0/shared/home/ndmp2
/ibm/gpfs0/shared/home/ndmp3
/ibm/gpfs0/shared/home/ndmp4
/ibm/gpfs0/shared/home/ndmp5
Backup Policy: #4
SET BACKUP_GROUP_NAME = incr_backup
SET BACKUP_GROUP_EXPIRATION = 5_HOUR
/ibm/gpfs0/shared/home/ndmp6
/ibm/gpfs0/shared/home/ndmp7
/ibm/gpfs0/shared/home/ndmp8
/ibm/gpfs0/shared/home/ndmp9
/ibm/gpfs0/shared/home/ndmp10
- At midnight, backup policy #1 is started on file module 1 and creates a snapshot.
- One hour later, at 1 AM, backup policy #2 is started on file module 2. Because that backup policy has the same backup group name as policy #1 (full_backup) and the snapshot that was created at midnight has not expired (it is only one hour old and expiration is set for 23 hours), another snapshot is NOT created and the existing snapshot for the group full_backup is used. This staggers the start of backups but uses the same point in time snapshot for both sets of directories so that they are consistent.
- At 6 AM, backup policy #3 is started on file module 1. Because there is no snapshot created yet with the group name of incr_backup, a new snapshot is created and an incremental backup is performed using this snapshot.
- At 7 AM, backup policy #4 is started on file module 2. Since there is a snapshot with the group name incr_backup and that snapshot has not expired (it is only one hour old and expiration is set for five hours), that existing snapshot is used.
- At noon, backup policy #3 is started on file module . There is a snapshot with the group name incr_backup, but it is six hours old and the expiration value is set to five hours. Therefore, the old snapshot is removed and a new snapshot is created. This newly created snapshot is the snapshot that is used for this incremental backup.
- At 1 PM, backup policy #4 is started on file module 2. As in the previous step, there is a snapshot with the group name incr_backup. The snapshot is only one hour old, so that snapshot is used for this incremental backup.
- At 6 PM, a process starts similar to the previous two steps, and backups #3 and #4 are started again, staggered an hour apart using a new snapshot.
- At midnight, the full backup policy #1 is started on file module 1. There is an old snapshot for the group full_backup, but that snapshot is 24 hours old and its expiration is 23 hours, so it is removed. There is also a snapshot for incr_backup, but because that snapshot is six hours old and its expiration is 5 hours it has expired and therefore it is also removed. Because there is no unexpired snapshot for the group full_backup, a new snapshot is created and the full backup proceeds on file module 1 with the newly created snapshot.
- At 1 AM, backup policy #2 is started on file module 2, and the entire scenario described above continues on the same schedule.
Tips
- If a snapshot expires and a backup that is using that snapshot is still in process, that snapshot is not removed until there are no more NDMP processes referencing that snapshot. This prevents a snapshot from expiring and being removed before a currently running backup using that snapshot completes.
- If no expiration value is specified, the default of 1 hour is used for the backup group expiration interval.
- If no backup group name is specified for a backup policy, a snapshot with no indicated backup group is used. Other snapshots that have backup group names associated with them will not be used by the "policy without a name" because the backup group names are not identical.
- To ensure that changes to data are contained in a backup, create snapshots as often as needed. For example, if a snapshot expiration value is set to 1 hour and another backup is started using the same backup group within that hour, none of the data that has changed since the snapshot was created for the backup group will be included in the backup.
- To force every NDMP session to create a new snapshot, set the backup group expiration value to 0_SEC. If multiple NDMP backup sessions are running simultaneously, Storwize V7000 Unified system performance might be reduced.
- If the NDMP function is activated for an NDMP file module group, or if it is reactivated after either being manually deactivated or as a result of changing an NDMP configuration value for that file module group, all NDMP-related snapshots are removed. Deactivating and reactivating the NDMP function for an NDMP file module group can be used to delete all previous snapshots for that file module group.
NDMP Prefetch
Using NDMP prefetch, NDMP backup of small files takes significantly less time to complete. The prefetch algorithm predicts the NDMP backup sequence and reads files into the file module's cache so that the backups have a high cache hit rate for small files. You can specify prefetch for as few as one to as many as ten NDMP backup sessions per file module. You can also specify between 50 and 180 threads to be split among all NDMP prefetch sessions on a file module. This allows you to tune how much prefetching should be performed based on the RAM configuration and processing power of the file module. NDMP prefetch is particularly useful when tape backups are performed, a small number of simultaneous NDMP backup sessions run on each file module, and the data being backed up contains a significant proportion of files with a file size lower than 1 MB. Files larger than 1 MB are not prefetched, so if the majority of files being backed up are larger than 1 MB, the benefit from using the NDMP prefetch feature is relatively small.
- By default, prefetch is not enabled for NDMP. Prefetch must be activated; changing any NDMP configuration parameter, including activating NDMP prefetch, causes any NDMP backup session currently in process to end prematurely.
- When NDMP prefetch is enabled on a file module, the default number of NDMP backup streams that can simultaneously use prefetch on the file module is four.
- The default number of prefetch threads that can run simultaneously on a file module is 100.
- Prefetch reduces backup time if used for a limited number of NDMP streams for files that are smaller than 1 MB in size.
- The performance improvement using prefetch is more noticeable during full backups than during incremental backups.