Tuning server-side data deduplication
Tune settings and configuration for different operations to ensure that the performance of server-side data deduplication is efficient.
Procedure
Tip: The following steps do not apply to container storage pools.
- Control processor resources by setting the number of duplicate
identification processes that you want to use. Do not exceed the number of processor cores available on your IBM Spectrum® Protect server when you set the NUMPROCESS value. Define a duration limit for the IDENTIFY DUPLICATES command, otherwise, processes that are running after the command is issued run indefinitely.
- Determine the threshold for reclamation of a deduplicated
storage pool. A deduplicated storage pool is typically reclaimed to a threshold that is less than the default of 60 to allow more of the identified duplicate extents to be removed. Experiment with the setting of this value to find a threshold that can be completed within the available time.
-
Determine how many reclamation processes to run.
Tip: A reclamation setting of more than 25 and less than 40 is sufficient.
- Schedule data deduplication processing that is based on
how you create a second copy of your data. If you are backing up your storage pool, do not overlap client backup and duplicate identification. Complete the storage pool backup before the identify process. If the storage pool backup is not complete, the copy process takes longer because it requires the deduplicated data to be reassembled before the backup.You can overlap duplicate identification and client backup operations in the following scenarios:
- You are not backing up your storage pool.
- You are using node replication to create a secondary copy of your data.
-
To prevent deadlocks in the IBM Spectrum Protect server,
you might need to modify the Db2®
LOCKLIST parameter before you deduplicate a large amount of data.
When the amount of concurrent data movement activity is high, deadlocks can occur in the server. If the amount of concurrent data that is moved exceeds 500 GB at a time, adjust the Db2 LOCKLIST parameter as follows:
Table 1. Tuning Db2 LOCKLIST parameter values Amount of data LOCKLIST parameter value 500 GB 122000 1 TB 244000 5 TB 1220000