Replication queue manager

The replication queue manager component off-loads replication from the NPS node, providing storage capabilities and transport capabilities between replication queue managers that are used for the replication log and backups.

Netezza Performance Server (NPS) node and replication queue manager communication

The NPS® node and the replication queue manager use two methods to communicate. They use the standard Network File System (NFS) to read data. Replication Services also uses the RQM File Transfer Sender, a custom write protocol with higher performance than that of NFS, to write all other data and to export replication catalog metadata. For NFS permissions to work properly, the user and group identifiers (UID and GID, which are numeric values) for the Netezza® administrator user (usually nz) must be equal on the NPS and replication queue managers. If this is not true, the NPS node experiences permission errors while writing data to the queue manager.

Replication queue manager network topology

All the replication queue managers in a replication set together define a distributed topology where every replication node is connected to every other replication node. Within this topology, each replication node has a single canonical replication queue manager host name that must resolve to the IP address of its WAN link. When replication queue manager hosts connect to each other, they exchange their canonical names. These fully qualified host names allow the RQM to discover network problems, such as duplicate connections or self-references, that are caused by network configuration problems, yet permit the RQM to support multihomed servers.

When replication queue managers connect, they also share the fully qualified host names of all other log server hosts in the topology with each other. This means that adding a RQM instance to any log server host in the topology adds that host to all other log server hosts in the topology. This saves the effort of having to add each log server host individually to all other log server hosts.

The default value for the canonical replication queue manager name is the Cloud Pak For Data System appliance fully qualified domain name that is configured for the replication queue manager.

It is highly recommended to use the traceroute command to identify the path from one queue manager to another to confirm that the path connects through the WAN IP address. Also, to protect against unauthorized access, use firewalls to limit access to the replication queue manager.

Priority of data transfer

The RQM software uses a small/large size heuristic algorithm for prioritizing the order in which data is sent. A small amount of data is given priority, so that control information is transmitted more quickly than a large amount of data is transmitted. A large amount of data is transmitted in blocks, one at a time, in the order in which it is logged (this can be different from the modify time).

The nuances around the RQM transfer algorithm can have effects on how replication occurs between nodes, depending on the nature of the workload or any user-initiated operations outside of replication. For example, a replication workload that involves several large loads in a transaction is not completed on the replica NPS node until all of the load data is transferred to the replica's associated replication queue manager and then the replica NPS host. In this example, the replication workload could be impacted if a user initiated a backup by specifying the -pts transfer option (an option of the nzreplbackup command that causes RQM software to be used to transfer the backup to the replica automatically). The option impacts performance because it creates additional data to be prioritized with the other large quantities of replication data.

In normal replication use, there should be a limited need to directly interact with the replication queue manager from an operational perspective. If there is unexpected behavior during replication (for example, an unexpected delay or error message), replication status information might indicate that you should check the queue manager for information. Additionally, you can check the queue manager status and operation as part of troubleshooting connectivity between replication sites.

Log storage management

The data partition on the replication queue manager is critical to both replication and the smooth running of the NPS host. If the RQM data partition fills up on the primary host, replication stops, and no transactions that modify replicated or global data are allowed to run. If the RQM data partition fills up on a replica host, the host becomes increasingly latent relative to the primary, as new replicated transactions cannot be received because of the lack of storage. The primary replication queue manager continues attempting to send transaction information until either space runs out on the primary or sending the information is successful. Only the log data that was successfully sent is deleted by the cleanup process.

Free disk space on the RQM data partition is constantly monitored by the replication service, which generates NPS events if the amount of disk space falls below a configurable percentage of the fully configured size. The replication service has a flexible and configurable utility for smart pruning of older transaction and replication-related data on the RQM data partition. The replication queue manager retains all logs and data until pruned. For more information, see nzreplprunepts command.

You can use the RQM to transfer backup data. However, doing so uses storage space. When determining storage requirements, consider the additional storage that is needed to support this process over and above replication alone. Also, any outside-of-replication backup sets (backups that are not transferred by using the RQM software) are not managed by the replication pruning utility. Therefore, you must manually manage cleanup of previous backup data.

If the RQM data partition is frequently nearing its storage limitations, consider running the pruning utility more often.