Managing deduplication-enabled storage pools

You can create a storage pool for data deduplication or you can update an existing storage pool. If you are implementing server-side data deduplication, Tivoli® Storage Manager provides the option of running duplicate-identification processes automatically or manually.

Before you begin

Before you set up a storage pool:

Determine which client nodes have data that you want to deduplicate. Decide whether you want to deduplicate data on a node-by-node basis, on either the client or the server.
Decide whether you want to define a new storage pool exclusively for data deduplication or update an existing storage pool. If you update a storage pool for data deduplication, Tivoli Storage Manager deduplicates the data that is already stored. No additional backup, archive, or migration is required. You can also define or update a storage pool for data deduplication, but not deduplicate data.
Decide how you want to control duplicate-identification processes.

About this task

You can create a storage pool for data deduplication or update an existing storage pool for data deduplication. You can store client-side deduplicated data and server-side deduplicated data in the same storage pool.

Procedure

To set up a storage pool for data deduplication, complete the following steps:

If you are defining a new storage pool:
1. Use the DEFINE STGPOOL command and specify the DEDUPLICATE=YES parameter.
2. Define a new policy domain to direct eligible client-node data to the storage pool.
If you are updating an existing storage pool:
1. Determine whether the storage pool contains data from one or more client nodes that you want to exclude from data deduplication. If it does:
  1. Using the MOVE DATA command, move the data that belongs to the excluded nodes from the storage pool to be converted to another storage pool.
  2. Direct data that belongs to the excluded nodes to the other storage pool. The easiest way to complete this task is to create another policy domain and designate the other storage pool as the destination storage pool.
2. Change the storage-pool definition with the UPDATE STGPOOL command. Specify the DEDUPLICATE and NUMPROCESSES parameters.

Results

As data is stored in the pool, the duplicates are identified. When the reclamation threshold for the storage pool is reached, reclamation begins, and the space that is occupied by duplicate data is reclaimed.

In the storage pool definition, you can specify as many as 50 duplicate-identification processes to start automatically. However, the number of duplicate-identification processes must not exceed the number of processor cores available on the Tivoli Storage Manager server. If you do not specify any duplicate-identification processes in the storage pool definition, you must control data deduplication manually. Duplicate identification requires extra disk I/O and processor resources. To mitigate the effects on server workload, you can manually increase or decrease the number of duplicate-identification processes, along with their duration.

Attention: By default, the Tivoli Storage Manager server requires that you back up deduplication-enabled primary storage pools before volumes in the storage pool are reclaimed and before duplicate data is discarded.