Scheduling data deduplication and node replication processes
Data deduplication and node replication are optional functions
that can be used with IBM
Spectrum Protect. They provide
added benefits but also require additional resources and consideration
for the daily schedule.
About this task
Depending on your environment, using data deduplication
and node replication can change the tasks that are required for the
daily schedule. If you are using node replication to create the backup
copy of your data, then storage pool backups are not needed. Likewise,
you do not need to migrate your data to tape storage pools for the
creation of offsite backup media.
The following image illustrates
how to schedule data deduplication and node replication processes
to achieve the best performance. Tasks that overlap in the image can
be run at the same time.
Restriction: The amount of duplicate
identification processes that can be overlapped is based on the processor
capability of the IBM
Spectrum Protect server
and the I/O capability of the storage pool disk.
Figure 1. Daily schedule when data deduplication
and node replication are used
The following steps include commands
to implement the schedule that is shown in the image. For this example,
tape is not used in the environment.
Procedure
Perform an incremental backup of all clients on the network
to a deduplicated file storage pool by using the incremental client
command or use another supported method for client backup.
You can run the following tasks in parallel:
Perform server-side duplicate identification by running
the IDENTIFY DUPLICATES command. If you are not
using client-side data deduplication, this step processes data that
was not already deduplicated on your clients.
Create a disaster recovery (DR) copy of the IBM
Spectrum Protect database
by running the BACKUP DB command. In addition,
run the BACKUP VOLHISTORY and BACKUP DEVCONFIG commands
to create DR copies of the volume history and device configuration
files.
Perform node replication to create a secondary copy of
the client data on another IBM
Spectrum Protect server by
using the REPLICATE NODE command.
By
performing node replication after duplicate identification processing,
you can take advantage of data reduction during replication.
Remove objects that exceed their allowed retention by using
the EXPIRE INVENTORY command.
Reclaim unused space from storage pool volumes that are
released through data deduplication and inventory expiration by using
the RECLAIM STGPOOL command.