Windows: Planning for directory-container and cloud-container storage pools

Review how your directory-container and cloud-container storage pools are set up to ensure optimal performance.

Question Tasks, characteristics, options, or settings More information
Measured in terms of input/output operations per second (IOPS), are you using fast disk storage for the IBM Spectrum Protect™ database?

Use a high-performance disk for the database. Use solid-state drive technology for data deduplication processing.

Ensure that the database has a minimum capability of 3000 IOPS. For each TB of data that is backed up daily (before data deduplication), add 1000 IOPS to this minimum.

For example, an IBM Spectrum Protect server that is ingesting 3 TB of data per day would need 6000 IOPS for the database disks:
3000 IOPS minimum + 3000 (3 
TB x 1000 IOPS) = 6000 IOPS
For recommendations about disk selection, see "Planning for server database disks".

For more information about IOPS, see the IBM Spectrum Protect Blueprints.

Do you have enough memory for the size of your database? Use a minimum of 40 GB of system memory for IBM Spectrum Protect servers, with a database size of 100 GB, that are deduplicating data. If the retained capacity of backup data grows, the memory requirement might need to be higher.

Monitor memory usage regularly to determine whether more memory is required.

Use more system memory to improve caching of database pages. The following memory size guidelines are based on the daily amount of new data that you back up:
  • 128 GB of system memory for daily backups of data, where the database size is 1 - 2 TB
  • 192 GB of system memory for daily backups of data, where the database size is 2 - 4 TB
Memory requirements
Have you properly sized the storage capacity for the database active log and archive log?

Configure the server to have a minimum active log size of 128 GB by setting the ACTIVELOGSIZE server option to a value of 131072.

The suggested starting size for the archive log is 1 TB. The size of the archive log is limited by the size of the file system on which it is located, and not by a server option. Ensure that there is at least 10% extra disk space for the file system than the size of the archive log.

Use a directory for the database archive logs with an initial free capacity of at least 1 TB. Specify the directory by using the ARCHLOGDIRECTORY server option.

Define space for the archive failover log by using the ARCHFAILOVERLOGDIRECTORY server option.

For more information about sizing for your system, see the IBM Spectrum Protect Blueprints.

Is compression enabled for the archive log and database backups? Enable the ARCHLOGCOMPRESS server option to save storage space.

This compression option is different from inline compression. Inline compression is enabled by default with IBM Spectrum Protect V7.1.5 and later.

Restriction: Do not use this option if the amount of backed up data exceeds 6 TB per day.

For more information about compression for your system, see the IBM Spectrum Protect Blueprints.

Are the IBM Spectrum Protect database and logs on separate disk volumes (LUNs)?

Is the disk that is used for the database configured according to best practices for a transactional database?

The database must not share disk volumes with IBM Spectrum Protect database logs or storage pools, or with any other application or file system.

For more information about server database and recovery log configuration, see Server database and recovery log configuration and tuning.
Are you using a minimum of eight (2.2 GHz or equivalent) processor cores for each IBM Spectrum Protect server that you plan to use with data deduplication? If you are planning to use client-side data deduplication, verify that client systems have adequate resources available during a backup operation to complete data deduplication processing. Use a processor that is at least the minimum equivalent of one 2.2 GHz processor core per backup process with client-side data deduplication.
Did you allocate enough storage space for the database? For a rough estimate, plan for 100 GB of database storage for every 50 TB of data that is to be protected in deduplicated storage pools. Protected data is the amount of data before data deduplication, including all versions of objects stored.

As a best practice, define a new container storage pool exclusively for data deduplication. Data deduplication occurs at the storage-pool level, and all data within a storage pool, except encrypted data, is deduplicated.

 
Have you estimated storage pool capacity to configure enough space for the size of your environment? You can estimate capacity requirements for a deduplicated storage pool by using the following technique:
  1. Estimate the base size of the source data.
  2. Estimate the daily backup size by using an estimated change and growth rate.
  3. Determine retention requirements.
  4. Estimate the total amount of source data by factoring in the base size, daily backup size, and retention requirements.
  5. Apply the deduplication ratio factor.
  6. Apply the compression ratio factor.
  7. Round up the estimate to consider transient storage pool usage.

For an example of using this technique, see Effective planning and use of deduplication.

Have you distributed disk I/O over many disk devices and controllers? Use arrays that consist of as many disks as possible, which is sometimes referred to as wide striping. Ensure that you use one database directory per distinct array on the subsystem.

Set the DB2_PARALLEL_IO registry variable to enable parallel I/O for each table space used if the containers in the table space span multiple physical disks.

When I/O bandwidth is available and the files are large, for example 1 MB, the process of finding duplicates can occupy the resources of an entire processor. When files are smaller, other bottlenecks can occur.

Specify eight or more file systems for the deduplicated storage pool device class so that I/O is distributed across as many LUNs and physical devices as possible.

For guidelines about setting up storage pools, see "Planning for storage pools in DISK or FILE device classes".

For information about setting the DB2_PARALLEL_IO variable, see Recommended settings for IBM DB2 registry variables.

Have you scheduled daily operations based on your backup strategy?
The best practice sequence of operations is in the following order:
  1. Client backup
  2. Storage pool protection
  3. Node replication
  4. Database backup
  5. Expire inventory
Do you have enough storage to manage the DB2® lock list? If you deduplicate data that includes large files or large numbers of files concurrently, the process can result in insufficient storage space. When the lock list storage is insufficient, backup failures, data management process failures, or server outages can occur.

File sizes greater than 500 GB that are processed by data deduplication are most likely to deplete storage space. However, if many backup operations use client-side data deduplication, this problem can also occur with smaller-sized files.

For information about tuning the DB2 LOCKLIST parameter, see Tuning server-side data deduplication.
Is sufficient bandwidth available to transfer data to an IBM Spectrum Protect server? To transfer data to an IBM Spectrum Protect server, use client-side or server-side data deduplication and compression to reduce the bandwidth that is required.

Use a V7.1.5 server or higher to use inline compression and use a V7.1.6 or later client to enable enhanced compression processing.

For more information, see the enablededup client option.
Have you determined how many storage pool directories to assign to each storage pool? Assign directories to a storage pool by using the DEFINE STGPOOLDIRECTORY command.

Create multiple storage pool directories and ensure that each directory is backed up to a separate disk volume (LUN).

 
Did you allocate enough disk space in the cloud-container storage pool?
To prevent backup failures, ensure that the local directory has enough space. Use the following list as a guide for optimal disk space:
  • For serial-attached SCSI (SAS) and spinning disk, calculate the amount of new data that is expected after daily data reduction (compression and data deduplication). Allocate up to 100 percent of that amount, in terabytes, for disk space.
  • Provide 3 TB for flash-based storage systems with fast network connections to on-premises, high-performance cloud systems.
  • Provide 5 TB for solid-state drive (SSD) systems with fast network connections to high-performance cloud systems.
 
Did you select the appropriate type of local storage?
Ensure that data transfers from local storage to cloud finish before the next backup cycle starts.
Tip: Data is removed from local storage soon after it moves to the cloud.
Use the following guidelines:
  • Use flash or SSD for large systems that have high-performing cloud systems. Ensure that you have a dedicated 10 GB wide area network (WAN) link with a high-speed connection to the object storage. For example, use flash or SSD if you have a dedicated 10 GB WAN link plus a high-speed connection to either an IBM® Cloud Object Storage location or to an Amazon Simple Storage Service (Amazon S3) data center.
  • Use larger capacity 15000 rpm SAS disks for these scenarios:
    • Medium-sized systems
    • Slower cloud connections, for example, 1 GB
    • When you use IBM Cloud Object Storage as your service provider across several regions
  • For SAS or spinning disk, calculate the amount of new data that is expected after daily data reduction (compression and data deduplication). Allocate up to 100 percent of that amount for disk space, in terabytes.