You can create a storage pool for data deduplication or
you can update an existing storage pool. If you are implementing server-side
data deduplication, Tivoli® Storage
Manager provides
the option of running duplicate-identification processes automatically
or manually.
Before you begin
Before you set up a storage pool:
- Determine which client nodes have data that you want to deduplicate.
Decide whether you want to deduplicate data on a node-by-node basis,
on either the client or the server.
- Decide whether you want to define a new storage pool exclusively
for data deduplication or update an existing storage pool. If you
update a storage pool for data deduplication, Tivoli Storage
Manager deduplicates
the data that is already stored. No additional backup, archive, or
migration is required. You can also define or update a storage pool
for data deduplication, but not deduplicate data.
- Decide how you want to control duplicate-identification processes.
About this task
You can create a storage pool for data deduplication or
update an existing storage pool for data deduplication. You can store
client-side deduplicated data and server-side deduplicated data in
the same storage pool.
Procedure
To set up a storage pool for data deduplication, complete
the following steps:
- If you are defining a new storage pool:
- Use the DEFINE STGPOOL command and
specify the DEDUPLICATE=YES parameter.
- Define a new policy domain to direct eligible client-node
data to the storage pool.
- If you are updating an existing storage pool:
- Determine whether the storage pool contains data from
one or more client nodes that you want to exclude from data deduplication.
If it does:
- Using the MOVE DATA command, move the data
that belongs to the excluded nodes from the storage pool to be converted
to another storage pool.
- Direct data that belongs to the excluded nodes to the other storage
pool. The easiest way to complete this task is to create another policy
domain and designate the other storage pool as the destination storage
pool.
- Change the storage-pool definition with the UPDATE
STGPOOL command. Specify the DEDUPLICATE and NUMPROCESSES parameters.
Results
As data is stored in the pool, the duplicates are identified.
When the reclamation threshold for the storage pool is reached, reclamation
begins, and the space that is occupied by duplicate data is reclaimed.
In
the storage pool definition, you can specify as many as 50 duplicate-identification
processes to start automatically. However, the number of duplicate-identification
processes must not exceed the number of processor cores available
on the Tivoli Storage
Manager server.
If you do not specify any duplicate-identification processes in the
storage pool definition, you must control data deduplication manually.
Duplicate identification requires extra disk I/O and processor resources.
To mitigate the effects on server workload, you can manually increase
or decrease the number of duplicate-identification processes, along
with their duration.
Attention: By default, the Tivoli Storage
Manager server requires
that you back up deduplication-enabled primary storage pools before
volumes in the storage pool are reclaimed and before duplicate data
is discarded.