IBM Tivoli Storage Manager, Version 7.1

Client-side data deduplication

In client-side data deduplication, the backup-archive client and the server identify and remove duplicate data to save storage space on the server.

Benefits

Client-side data deduplication provides the following advantages:

Client-side data deduplication stores data directly in a deduplicated format. If storage pool backup is used to create secondary copies to a non-deduplicated storage pool, client extents are reassembled into contiguous files. (Extents are parts of a file that are created during the data-deduplication process.) This reassembly can cause storage pool backup processing to take longer when compared to processing data that was not previously deduplicated.

Requirements

When you configure client-side data deduplication, the following requirements must be met:

Configuration options for client-side deduplication

To take advantage of the client-side data deduplication feature, the following options are available:
  • Exclude specific files on a client from data deduplication by using the exclude.dedup client option.
  • Enable a data deduplication cache, which reduces network traffic between the client and the server. The cache on the client can be enabled through the client options file.

    Specify a size and location for a client cache.

    Restriction: For applications that use the Tivoli Storage Manager API, do not use the data deduplication cache because backup failures might occur when the cache is out of sync with the Tivoli Storage Manager server. If multiple, concurrent Tivoli Storage Manager client sessions are configured, you must configure a separate cache for each session.
  • Enable both client-side data deduplication and compression to reduce the amount of data that is stored on the server. Each extent is compressed before it is sent to the server. However, you must balance the benefits of storage savings versus the processing power that is required to compress client data. In general, if you compress and deduplicate data on the client system, you typically use about twice as much processing power as data deduplication alone.

    The server can process compressed data that has been deduplicated. In addition, backup-archive clients earlier than V6.2 can restore deduplicated, compressed data.

Client-side data deduplication and storage pools

If client-side data deduplication is enabled and the primary destination storage pool is full, and another storage pool is in the hierarchy, the server stops the transaction. Client-side data deduplication is disabled, and the client tries the transaction again with files that are not deduplicated.

If the backup operation is successful and if the next storage pool is enabled for data deduplication, the files are deduplicated by the server. If the next storage pool is not enabled for data deduplication, the files are not deduplicated.

To ensure that client-side data deduplication can complete processing, maintain sufficient free storage in your primary destination storage pool.

LAN-free access to storage pools that contain client-side deduplicated data

Only V6.2 and later storage agents can use LAN-free data movement to access storage pools that contain data that was deduplicated by clients. V6.1 storage agents or later can complete operations over the LAN.
Table 1. Paths for data movement
  Storage pool contains only client-side deduplicated data Storage pool contains a mixture of client-side and server-side deduplicated data Storage pool contains only server-side deduplicated data
V6.1 or earlier storage agent Over the LAN Over the LAN LAN-free
V6.2 storage agent LAN-free LAN-free LAN-free

V6.2 backup-archive clients are compatible with V6.2 storage agents and provide LAN-free access to storage pools that contain client-side deduplicated data.



Feedback