Planning for data reduction pools and deduplicated volumes

A deduplicated volume or volume copy can be created in a data reduction pool. When you implement deduplication, you must consider specific requirements in the storage environment.

Deduplication can be configured with volumes that use different capacity saving methods, such as thin-provisioning. Deduplicated volumes must be created in data reduction pools for added capacity savings. Deduplication is a type of data reduction that eliminates duplicate copies of data. Deduplication of user data occurs within a data reduction pool and only between volumes or volume copies that are marked as deduplicated. Some models or software versions require specific hardware or software to use this function. For more information, see planning data reduction pools and deduplication.

The following software and hardware requirements are needed for deduplication. Update and performance considerations also exist.

Avoid Global Mirror with Change Volumes to or from a deduplicated volume.
You can use the Data Reduction Estimation Tool (DRET) to estimate how much capacity you might save if a standard volume that a host can access was a deduplicated volume. The tool scans target workloads on all attached storage arrays, consolidates these results, and generates an estimate of potential data reduction savings for the entire system.
For more information about DRET, see https://www.ibm.com/support/pages/node/6217841. For more information about Comprestimator, see https://www.ibm.com/support/pages/node/6209688.
To ensure that your intended use of deduplicated volumes has adequate performance for your application, refer to the Best Practice for Performance at IBM Redbooks.

To use data reduction pools and deduplication in IBM Storage Virtualize for Public Cloud implementations require that certain hardware with specific memory specifications is provisioned from the supported cloud service provider. Table 1 details the hardware requirements for each supported implementation.

Table 1. Data reduction and deduplication requirements for supported IBM Storage Virtualize for Public Cloud implementations
Supported Cloud Service Provider	Machine Type	CPU requirements	Memory requirements (GiB)	Support for data reduction pools	Support the unmap action for host ¹	Support the unmap action for backend storage	Support for deduplicated volumes in data reduction pools	Support for compressed volumes in data reduction pools
IBM Cloud	IBM Cloud bare metal server	12	64	Yes	Yes	No	Yes	No
AmazonWeb Services (AWS)	AWS EC2 c5.4xlarge	16	32	Yes	Yes	No	no²	Yes
	AWS EC2 c5.9xlarge	36	72	Yes	Yes	No	Yes	Yes
	AWS EC2 c5.18xlarge	72	144	Yes	Yes	No	Yes	Yes
Microsoft Azure	Standard_D16s_v3	16	64	Yes	Yes	No	Yes	Yes
	Standard_D32s_v3	32	128	Yes	Yes	No	Yes	Yes
	Standard_D64s_v3	64	256	Yes	Yes	No	Yes	Yes

¹

Support for the host SCSI unmap command is unavailable by default. To enable support for a host to use SCSI unmap commands, enter the following command.

chsystem -hostunmap on

²

Although this machine type support 32 GB of memory, deduplication requires additional overhead capacity to track duplicate data within the data reduction pool.