Tune settings and configuration for different operations
to ensure that the performance of server-side data deduplication is
efficient.
Procedure
- Control processor resources by setting the number of duplicate
identification processes that you want to use. Do not exceed
the number of processor cores available on your Tivoli® Storage
Manager server when
you set the NUMPROCESS value. Define a duration
limit for the IDENTIFY DUPLICATES command or processes
that are running after the command is issued will run indefinitely.
- Determine the threshold for reclamation of a deduplicated
storage pool. A deduplicated storage pool is typically
reclaimed to a threshold that is less than the default of 60 to allow
more of the identified duplicate extents to be removed. Experiment
with the setting of this value to find a threshold that can be completed
within the available time.
Tip: A reclamation setting
of 40 or less is usually sufficient.
- Determine how many reclamation processes to run.
- Schedule data deduplication processing based on how you
are creating a second copy of your data. If you are backing
up your storage pool, do not overlap client backup and duplicate identification.
Storage pool backup should be completed before the identify process
or the copy process will take longer because it requires the deduplicated
data to be reassembled before the backup. See Scheduling data deduplication and node replication processes for details about the best
practice sequence of daily events.
You can overlap duplicate identification
and client backup operations if you are not backing up your storage
pool or if you are using node replication to create a secondary copy
of your data. Running these operations together can reduce the time
that is needed to finish processing, but might increase the time that
is required for client backup.