IBM Tivoli Storage Manager, Version 7.1

Server-side data deduplication tuning

Tune settings and configuration for different operations to ensure that the performance of server-side data deduplication is efficient.

Procedure

  1. Control processor resources by setting the number of duplicate identification processes that you want to use. Do not exceed the number of processor cores available on your Tivoli® Storage Manager server when you set the NUMPROCESS value. Define a duration limit for the IDENTIFY DUPLICATES command or processes that are running after the command is issued will run indefinitely.
  2. Determine the threshold for reclamation of a deduplicated storage pool. A deduplicated storage pool is typically reclaimed to a threshold that is less than the default of 60 to allow more of the identified duplicate extents to be removed. Experiment with the setting of this value to find a threshold that can be completed within the available time.
    Tip: A reclamation setting of 40 or less is usually sufficient.
  3. Determine how many reclamation processes to run.
  4. Schedule data deduplication processing based on how you are creating a second copy of your data. If you are backing up your storage pool, do not overlap client backup and duplicate identification. Storage pool backup should be completed before the identify process or the copy process will take longer because it requires the deduplicated data to be reassembled before the backup. See Scheduling data deduplication and node replication processes for details about the best practice sequence of daily events.

    You can overlap duplicate identification and client backup operations if you are not backing up your storage pool or if you are using node replication to create a secondary copy of your data. Running these operations together can reduce the time that is needed to finish processing, but might increase the time that is required for client backup.



Feedback