Eventually consistent design, latency, and backlog

A replication set is consistent when SQL queries on one replication node of the group produce the same result on all replication nodes of the group. To keep databases on different nodes identical, IBM® Netezza® Replication Services uses the eventually consistent method.

With the eventually consistent method, there might be times when one node (the primary) has newer database content than the others. The amount of time by which the replica lags behind the primary is called latency, and the amount of data that must be replicated and applied by the replica is the backlog.

Latency and backlog depend on the specific configuration and workload. The specific causes of latency in IBM Netezza Replication Services are as follows:
  • The ability of primary nodes to perform SQL update transactions (which include, for example, INSERT, DELETE, or UPDATE) against replicated data in parallel is subject to the current transactions' isolation levels (which can be serializable or snapshot isolation). Replicas perform the replicated transactions serially in the order which they were committed on the primary. The time that is taken to complete a set of concurrent replicated transactions on the primary is equivalent to the time taken by the longest transaction. However, the time that is taken to complete these same transactions on a replica is the sum of the times to execute each transaction.
  • Load file data is replicated to the replica hosts after it is fully loaded on the primary. All the load file data is first completely captured in the primary replication queue manager log, and then the data is transmitted to the replica. Only when all data for a transaction is fully transferred does the replica begin to apply the data. The time to transfer the replication log is the total amount of data divided by the window-limited bandwidth of the replication queue manager connection and contributes directly to latency. For multiple sequential loads in the same transaction, IBM Netezza Replication Services does not wait for all loads to be completed before beginning transfer; the latency falls between the time to transmit the last large load and the time to transmit all the data.
The eventually consistent design has implications for applications that use IBM Netezza Replication Services for larger scale implementations. Some design considerations are as follows:
  • Queries that must run with zero latency must be issued on the current primary. The current primary can be identified from the replication set operational views.
  • Applications doing queries that must run with low latency check the replication set operational views to determine the current latency of replicas and then dispatch the queries to replicas whose latencies are within acceptable limits.
  • Queries that are not sensitive to latency can be dispatched to any NPS node in the replication set.
  • You can design queries to use row data, such as a date, time, batch run, or other identifying column value, to exclude results that are newer than a particular value.