Checklist for implementing data replication

A successful implementation of data replication relies on sufficient, dedicated hardware resources. Increased amounts of memory and processor cores are required. The database and its logs must be appropriately sized to ensure that transactions can complete. A dedicated network, with enough bandwidth to handle the amount of data you intend to replicate, is required.

Use the checklist to verify that hardware and your IBM Storage Protect configuration have characteristics that are key to good performance.

Question Tasks, characteristics, options, or settings More information
Are you using a high-performance disk for the IBM Storage Protect database? Ensure that the disks that are configured for the IBM Storage Protect database have a minimum capability of 3,000 I/O operations per second (IOPS). For each TB of data that is backed up daily (before data deduplication), add 1,000 IOPS to this minimum.
For example:
  • For daily ingest of data of 1 TB, the server needs 4000 IOPS.
  • For daily ingest of data of 10 TB, the server needs 13000 IOPS.
  • For daily ingest of data of 10 TB - 30 TB, the server needs 13000 IOPS - 33000 IOPS.
  • For daily ingest of data of 30 TB - 100 TB, the server needs 33000 IOPS - 100300 IOPS.
3000 IOPS minimum + 30000 (30 
TB x 1000 IOPS) = 33000 IOPS
Checklist for server database disks
Are you using enough processor cores and memory for replication operations and, optionally, data deduplication? If you are replicating data without deduplication, use a minimum of 6 processor cores and 64 GB of RAM for the source and each target replication servers.

For any server that is configured for data replication and data deduplication, use a minimum of 10 processor cores and 128 GB of RAM.

 
Have you properly sized your disk space for the database, logs, and storage pools? To determine whether your database can handle the additional space requirements, you must first estimate how much more database space data replication uses.

For the active log, use a minimum size of 64 GB for data replication. Use the maximum allowed size for the active log, which is 128 GB if you are also using data deduplication.

Make the archive log space at least as large as the space defined for the active log. Also, specify a directory for the archive failover loginin case it is needed.

Determing server database requirements for node replication (version 7.1.1)
Is your network capable of handling the additional traffic for the amount of data that you intend to replicate between source and target replication servers? For data replication, network bandwidth must be greater than the planned maximum throughput.

You can estimate network bandwidth that is based on the amount of data that you are replicating.

Estimating network bandwidth for node replication (version 7.1.1)
If your IBM Storage Protect server replicates nodes or protects storage pools to a remote server, did you determine whether Aspera® Fast Adaptive Secure Protocol (FASP®) technology can improve data throughput?
Restrictions:
  • Use Aspera FASP technology when your wide area network (WAN) shows signs of high packet loss, data transfer delays that are caused by network impairment, or both. If WAN performance meets your business needs, do not enable Aspera FASP technology.
  • To enable Aspera FASP technology for replication operations, the data must be stored in a directory-container storage pool.
  • Aspera FASP technology is available only on Linux® x86_64 operating systems.
  • Before you enable Aspera FASP technology, you must obtain the appropriate licenses. Both evaluation and full licenses are available.
See Determining whether Aspera FASP technology can optimize data transfer in your system environment.
Are you using replication storage rules to replicate data? If you implemented replication by using the REPLICATE NODE command, consider a transition to replication storage rules and subrules, as it can improve the performance of replication operations. Streamline and improve replication by using replication storage rules
Are you using data deduplication with data replication? Consider using data deduplication if you are not already doing so. By using data deduplication with replication operation, you reduce the bandwidth that is required for replication operations. Data deduplication reduces the amount of data that is sent to the target replication server during replication operation. Measuring effects of data deduplication on node replication processing (version 7.1.1)
Have you scheduled data replication at the optimum time in the daily schedule?

Before replication: Schedule data deduplication and client backup. processing before replication processing. (Should client backup occur before data deduplication? Or does it matter?)

After replication: Schedule compression.

For more information, see the following topics:
Have you optimized the number of sessions that are used for sending data to the target replication server? You can improve replication performance by using the MAXSESSIONS parameter on replication storage rules or the REPLICATE NODE command to specify data sessions.

The number of sessions that are used for replication depends on the amount of data that you are replicating.

Managing the number of replication sessions (version 7.1.1)
Do you have enough mount points to avoid stalled replication servers and other server processes? Determine the number of logical and physical drives that can be dedicated to the replication process. For example, if a library has 10 tape drives and four of the drives are used for another task, there are six available drives for replication operation.

Specify the number of mount points you require and ensure that there are drives available to complete replication operation.

Typically tape is not used for data replication except for the initial replication.
Does the replication operation completely replicate all newly ingested data before the beginning of the next backup cycle? If the replication processes cannot finish before the start of the next backup cycle, consider the following actions:
  • Ensure that there are sufficient mount points and drives available for replication processes to complete.
  • Increase the number of data sessions that are used for replication operation.
  • Upgrade to faster hardware and more bandwidth for the source and target replication servers.
 
If you are using data deduplication with replication operation, do the processes for identifying duplicates complete before the start of replication operation so that data deduplication is used to its full advantage? If the process completes, or goes into an idle state before replication operation begins, then all new data is being processed.