In client-side data deduplication, the backup-archive
client and the server identify and remove duplicate data to save storage
space on the server.
Benefits
Client-side
data deduplication provides the following advantages:
- It can reduce the amount of data that is sent over the local area
network (LAN).
- Extra processing power and time that is required to remove duplicate
data on the server is eliminated.
- Space savings occur immediately on the server because the client
removed the duplicated data.
- Extra reclamation processing is not required to remove the redundant
data from the server.
- It is no longer necessary to identify duplicates on the server.
Client-side data deduplication stores data directly in a deduplicated
format. If storage pool backup is used to create secondary copies
to a non-deduplicated storage pool, client extents are reassembled
into contiguous files. (Extents are parts of a file that are created
during the data-deduplication process.) This reassembly can cause
storage pool backup processing to take longer when compared to processing
data that was not previously deduplicated.
Requirements
When you
configure client-side data deduplication, the following requirements
must be met:
- The client and server must be at version 6.2.0 or later.
- Client-side deduplication cannot be used in combination
with LAN-free backups.
- The primary storage pool must be a sequential-acces
disk (FILE) storage pool that is enabled for data deduplication.
- The value of the DEDUPLICATION option on the
client must be set to yes. You can set the DEDUPLICATION option
in the client options file, in the preference editor of the Tivoli® Storage
Manager client GUI,
or in the client option set on the Tivoli Storage
Manager server.
- Client-side data deduplication must be enabled on the server by
using the DEDUPLICATION parameter on the REGISTER
NODE or UPDATE NODE server command.
- Files that are intended for deduplication must not be excluded.
- Files that are intended for deduplication must not be encrypted.
Encrypted files and files from encrypted file systems cannot be deduplicated.
Configuration options for client-side
deduplication
To take advantage of the client-side
data deduplication feature, the following options are available:
- Exclude specific files on a client from data deduplication by
using the exclude.dedup client option.
- Enable a data deduplication cache, which reduces
network traffic between the client and the server. The cache on the
client can be enabled through the client options file.
Specify a
size and location for a client cache.
Restriction: For
applications that use the Tivoli Storage
Manager API, do
not use the data deduplication cache because backup failures might
occur when the cache is out of sync with the Tivoli Storage
Manager server.
If multiple, concurrent Tivoli Storage
Manager client sessions
are configured, you must configure a separate cache for each session.
- Enable both client-side data deduplication and compression to
reduce the amount of data that is stored on the server. Each extent
is compressed before it is sent to the server. However, you must balance
the benefits of storage savings versus the processing power that is
required to compress client data. In general, if you compress and
deduplicate data on the client system, you typically use about twice
as much processing power as data deduplication alone.
The server
can process compressed data that has been deduplicated. In addition,
backup-archive clients earlier than V6.2 can restore deduplicated,
compressed data.
Client-side data deduplication and storage pools
If
client-side data deduplication is enabled and the primary destination
storage pool is full, and another storage pool is in the hierarchy,
the server stops the transaction. Client-side data deduplication is
disabled, and the client tries the transaction again with files that
are not deduplicated.
If the backup operation is successful
and if the next storage pool is enabled for data deduplication, the
files are deduplicated by the server. If the next storage pool is
not enabled for data deduplication, the files are not deduplicated.
To
ensure that client-side data deduplication can complete processing,
maintain sufficient free storage in your primary destination storage
pool.
LAN-free access to storage pools that contain client-side
deduplicated data
Only V6.2 and later storage agents can use
LAN-free data movement to access storage pools that contain data that
was deduplicated by clients. V6.1 storage agents or later can complete
operations over the LAN.
Table 1. Paths for data
movement| |
Storage pool contains only client-side
deduplicated data |
Storage pool contains a mixture
of client-side and server-side deduplicated data |
Storage pool contains only server-side
deduplicated data |
| V6.1 or earlier storage agent |
Over the LAN |
Over the LAN |
LAN-free |
| V6.2 storage agent |
LAN-free |
LAN-free |
LAN-free |
V6.2 backup-archive clients are compatible with V6.2 storage
agents and provide LAN-free access to storage pools that contain client-side
deduplicated data.