Data deduplication is a method of reducing storage needs
by eliminating redundant data.
Overview
Two types of data deduplication
are available on Tivoli® Storage
Manager: client-side data deduplication and server-side
data deduplication.
Client-side data deduplication is
a data deduplication technique that is used on the backup-archive
client to remove redundant data during backup and archive processing
before the data is transferred to the Tivoli Storage
Manager server. Using client-side data deduplication can reduce the
amount of data that is sent over a local area network.
Server-side
data deduplication is a data deduplication technique that is
done by the server. The Tivoli Storage
Manager administrator can specify the data deduplication location
(client or server) to use with the DEDUP parameter
on the REGISTER NODE or UPDATE NODE server
command.
Enhancements
With client-side
data deduplication, you can:
- Exclude specific files on a client from data deduplication.
- Enable a data deduplication cache that reduces network traffic
between the client and the server. The cache contains extents that
were sent to the server in previous incremental backup operations.
Instead of querying the server for the existence of an extent, the
client queries its cache.
Specify a size and location for a client
cache. If an inconsistency between the server and the local cache
is detected, the local cache is removed and repopulated.
Note: For
applications that use the Tivoli Storage
Manager API, the
data deduplication cache must not be used because of the potential
for backup failures caused by the cache being out of sync with the Tivoli Storage
Manager server.
If multiple, concurrent Tivoli Storage
Manager client sessions
are configured, there must be a separate cache configured for each
session.
- Enable both client-side data deduplication and compression to
reduce the amount of data that is stored by the server. Each extent
is compressed before it is sent to the server. The trade-off is between
storage savings and the processing power that is required to compress
client data. In general, if you compress and deduplicate data on the
client system, you are using approximately twice as much processing
power as data deduplication alone.
The server can work with deduplicated,
compressed data. In addition, backup-archive clients earlier than
V6.2 can restore deduplicated, compressed data.
Client-side data deduplication uses the following process:
- The client creates extents. Extents are parts of
files that are compared with other file extents to identify duplicates.
- The client and server work together to identify duplicate extents.
The client sends non-duplicate extents to the server.
- Subsequent client data-deduplication operations create new extents.
Some or all of those extents might match the extents that were created
in previous data-deduplication operations and sent to the server.
Matching extents are not sent to the server again.
Benefits
Client-side
data deduplication provides several advantages:
- It can reduce the amount of data that is sent over the local area
network (LAN).
- The processing power that is required to identify duplicate data
is offloaded from the server to client nodes. Server-side data deduplication
is always enabled for deduplication-enabled storage pools. However,
files that are in the deduplication-enabled storage pools and that
were deduplicated by the client, do not require additional processing.
- The processing power that is required to remove duplicate data
on the server is eliminated, allowing space savings on the server
to occur immediately.
Client-side data deduplication has a possible disadvantage.
The server does not have whole copies of client files until you
back up the primary storage pools that contain client extents to a
non-deduplicated copy storage pool. (Extents are parts of a
file that are created during the data-deduplication process.) During
storage pool backup to a non-deduplicated storage pool, client extents
are reassembled into contiguous files.
By default, primary sequential-access
storage pools that are set up for data deduplication must be backed
up to non-deduplicated copy storage pools before they can be reclaimed
and before duplicate data can be removed. The default ensures that
the server has copies of whole files at all times, in either a primary
storage pool or a copy storage pool.
Important: For further data reduction, you can
enable client-side data deduplication and compression together. Each
extent is compressed before it is sent to the server. Compression
saves space, but it increases the processing time on the client workstation.
The
following options pertain to data deduplication:
- Deduplication
- Dedupcachepath
- Dedupcachesize
- Enablededupcache
- Exclude.dedup
- Include.dedup