IBM Tivoli Storage Manager, Version 7.1

Client-side data deduplication

In client-side data deduplication, the backup-archive client and the server identify and remove duplicate data to save storage space on the server.

Benefits

Client-side data deduplication provides the following advantages:

It can reduce the amount of data that is sent over the local area network (LAN).
Extra processing power and time that is required to remove duplicate data on the server is eliminated.
Space savings occur immediately on the server because the client removed the duplicated data.
Extra reclamation processing is not required to remove the redundant data from the server.
It is no longer necessary to identify duplicates on the server.

Client-side data deduplication stores data directly in a deduplicated format. If storage pool backup is used to create secondary copies to a non-deduplicated storage pool, client extents are reassembled into contiguous files. (Extents are parts of a file that are created during the data-deduplication process.) This reassembly can cause storage pool backup processing to take longer when compared to processing data that was not previously deduplicated.

Requirements

When you configure client-side data deduplication, the following requirements must be met:

The client and server must be at version 6.2.0 or later.
Client-side deduplication cannot be used in combination with LAN-free backups.
The primary storage pool must be a sequential-acces disk (FILE) storage pool that is enabled for data deduplication.
The value of the DEDUPLICATION option on the client must be set to yes. You can set the DEDUPLICATION option in the client options file, in the preference editor of the Tivoli® Storage Manager client GUI, or in the client option set on the Tivoli Storage Manager server.
Client-side data deduplication must be enabled on the server by using the DEDUPLICATION parameter on the REGISTER NODE or UPDATE NODE server command.
Files that are intended for deduplication must not be excluded.
Files that are intended for deduplication must not be encrypted. Encrypted files and files from encrypted file systems cannot be deduplicated.

Configuration options for client-side deduplication

To take advantage of the client-side data deduplication feature, the following options are available:

Exclude specific files on a client from data deduplication by using the exclude.dedup client option.
Enable a data deduplication cache, which reduces network traffic between the client and the server. The cache on the client can be enabled through the client options file.
Specify a size and location for a client cache.

Restriction: For applications that use the Tivoli Storage Manager API, do not use the data deduplication cache because backup failures might occur when the cache is out of sync with the Tivoli Storage Manager server. If multiple, concurrent Tivoli Storage Manager client sessions are configured, you must configure a separate cache for each session.
Enable both client-side data deduplication and compression to reduce the amount of data that is stored on the server. Each extent is compressed before it is sent to the server. However, you must balance the benefits of storage savings versus the processing power that is required to compress client data. In general, if you compress and deduplicate data on the client system, you typically use about twice as much processing power as data deduplication alone.
The server can process compressed data that has been deduplicated. In addition, backup-archive clients earlier than V6.2 can restore deduplicated, compressed data.

Client-side data deduplication and storage pools

If client-side data deduplication is enabled and the primary destination storage pool is full, and another storage pool is in the hierarchy, the server stops the transaction. Client-side data deduplication is disabled, and the client tries the transaction again with files that are not deduplicated.

If the backup operation is successful and if the next storage pool is enabled for data deduplication, the files are deduplicated by the server. If the next storage pool is not enabled for data deduplication, the files are not deduplicated.

To ensure that client-side data deduplication can complete processing, maintain sufficient free storage in your primary destination storage pool.

LAN-free access to storage pools that contain client-side deduplicated data

Only V6.2 and later storage agents can use LAN-free data movement to access storage pools that contain data that was deduplicated by clients. V6.1 storage agents or later can complete operations over the LAN.

Table 1. Paths for data movement
	Storage pool contains only client-side deduplicated data	Storage pool contains a mixture of client-side and server-side deduplicated data	Storage pool contains only server-side deduplicated data
V6.1 or earlier storage agent	Over the LAN	Over the LAN	LAN-free
V6.2 storage agent	LAN-free	LAN-free	LAN-free

V6.2 backup-archive clients are compatible with V6.2 storage agents and provide LAN-free access to storage pools that contain client-side deduplicated data.

Feedback