Data deduplication options

Use inline data deduplication to deduplicate data and write the data to a container storage pool at the same time. Use postprocess data deduplication to eliminate duplicate data from sequential access (FILE) storage pools.

Tip: Although data deduplication is available for sequential-access (FILE) storage pools, the use of directory-container and cloud-container storage pools is preferred because those storage pool types offer newer technologies for data storage. Data that is stored in cloud-container or directory-container storage pools uses either inline data deduplication or client-side data deduplication.

You must use directory-container storage pools or cloud-container storage pools for inline data deduplication. By using directory-container or cloud-container storage pools, you reduce the need for offline reorganization, which improves server performance and reduces the cost of storage hardware. You do not use device classes or volumes with these types of storage pool.

By using postprocess data deduplication, the server identifies the data first and then removes the duplicate data to the storage pool. Only one instance of the data is retained on storage media. Other instances of the same data are replaced with a pointer to the retained instance. When you remove the duplicate data, you can reclaim space in the storage pool.

For more information about postprocess data deduplication, see Deduplicating data (V7.1.1).

In client-side data deduplication, only compressed, deduplicated data is sent to the server. Processing is distributed between the server and the client during a backup process.

Use the following table to compare data deduplication options.

Type of data deduplication Advantages Disadvantages
Post-process
Restriction: You can use postprocess data deduplication only with sequential access (FILE) storage pools.
  • After data deduplication, you can reclaim the storage pool.
  • Longer processing times because the data must be identified first before the duplicate data is removed from the storage pool.
Inline
Restriction: You can use inline data deduplication only with directory-container and cloud-container storage pools.
  • Deduplicates data as the data is written to a container storage pool.
  • Reduces the need for offline reorganization which improves server performance.
  • Reduced cost of storage hardware.
  • Higher processor usage by the server.
Client-side
  • Processing is distributed between the server and the client during a backup process.
  • Higher processor usage by the client.
  • Longer elapsed time for client operations such as backup.
  • Only compressed, deduplicated data is sent to the server.