Tuning client-side data deduplication
The performance of client-side data deduplication can be affected by processor requirements and deduplication configuration.
About this task
Procedure
Action | Explanation |
---|---|
Ensure that the client system meets the minimum hardware requirements for client-side data deduplication. | Before you decide to use client-side data deduplication,
verify that the client system has adequate resources available during
the backup window to run the deduplication processing. The preferred minimum processor requirement is the equivalent of one 2.2 GHz processor core per backup process with client-side data deduplication. For example, a system with a single-socket, quad-core, 2.2-GHz processor that is used 75% or less during the backup window is a good candidate for client-side data deduplication. |
Use a combination of deduplication and compression to obtain significant data reduction. | When data is compressed after it is already deduplicated, it can give you more savings in data reduction as compared to running data deduplication alone. When data deduplication and compression are both enabled during a backup operation on the backup-archive client, the operations are sequenced in the preferred order (data deduplication followed by compression). |
Avoid running client compression in combination with server-side data deduplication. | When you use client compression in combination with server-side data deduplication, it is typically slower and reduces data volume less than the preferred alternatives of server-side data deduplication alone, or the combination of client-side data deduplication and client-side compression. |
Increase the number of parallel sessions as an effective way to improve overall throughput when you are using client-side deduplication. This action applies to client systems that have sufficient processor resources, and when the client application is configured to perform parallel backups. | For example, when you use IBM Spectrum Protect for Virtual Environments, it might be possible to use up to 30 parallel
VMware backup sessions before a 1 Gb network becomes saturated. Rather than immediately configuring
numerous parallel sessions to improve throughput, increment the number of sessions gradually, and
stop when you no longer see improvements in throughput. For information about optimizing parallel backups, see Optimizing parallel backups of virtual machines. |
Configure the client data deduplication cache with the enablededupcache option. | The client must query the server for each extent
of data that is processed. You can reduce the processor usage that
is associated with this query process by configuring the cache on
the client. With the data deduplication cache, the client can identify
previously discovered extents during a backup session without querying
the IBM
Spectrum Protect server. The following guidelines apply when you configure the client data deduplication cache:
Restriction:
|
Decide whether to use client-side data deduplication or server-side data deduplication. | Whether you choose to use client-side data deduplication
depends on your system environment. In a network-constrained environment,
you can run data deduplication on the client to improve the elapsed
time for backup operations. If the environment is not network-constrained
and you run data deduplication on the client, it can result in longer
elapsed backup times. To evaluate whether to use client-side data
or server-side data deduplication, see the information in Table 2.
|
Use the following checklist to help you choose whether to implement client-side or server-side data deduplication.
Question | Response |
---|---|
Does the speed of your backup network result in long backup times? |
|
What is more important to your business: The amount of storage savings that you achieve through data reduction technologies, or how quickly backups complete? | Consider the trade-offs between having the fastest
elapsed backup times and gaining the maximum amount of storage pool
savings:
|
What to do next
For more information about using IBM Spectrum Protect deduplication, see https://www.ibm.com/developerworks/community/wikis/home/wiki/Tivoli%20Storage%20Manager/page/Container%20Pool%20Best%20Practices.