Synchronization
IBM® Netezza® Replication Services uses two approaches to maintaining synchronization: SQL statement replication (by-SQL replication) and change value replication (by-value replication). The method that is used is based on whether the SQL transaction contains a deterministic value.
A SQL statement is deterministic in a replication set if it returns the identical result when executed on any replication node in the set. This definition is different from that of a statement on a single NPS node. For example, a function that returns an object ID is deterministic on a single NPS node but is non-deterministic in a replication environment, because different replication nodes might assign different IDs for the same object.
By-SQL replication
With by-SQL replication, when SQL transactions are executed on the primary side, the same by-SQL replication is transported to the replica side and executed there. To do this, the primary records the SQL Data Manipulation Language (DML) and Data Definition Language (DDL) statements that update its replicated databases and writes them in the replication log. The log also captures external table data that is used to modify tables. Netezza Replication Services then copies the log across the network to all replica nodes in the replication set. When the replica replays the log, it replays and applies the statements in a by-SQL fashion.
By-value replication
However, there are some instances in which a by-SQL replication contains a non-deterministic value (a SQL statement that can produce different results when run on different hosts). For example, due to network crossing time, time zone differences, or latency, a SQL statement that calls the now() function would return different results if run on the primary and then replayed on the replica. In such scenarios, capturing and replaying the SQL statement alone could not result in synchronized databases. To handle non-deterministic SQL operations, the primary processes the transaction by value instead of by SQL. That is, upon detecting non-deterministic values, the primary captures the result set of the SQL statements and sends it to the replica. When the replica replays the log, it applies the net effect (the DDL and row before and after image capture) of the SQL statement on the primary.
Current time and date are examples of non-deterministic functions that are supported through by-value replication. When current time and date functions are executed, the replication log captures the primary value and replicates by value to the replica. The effect of by-value replication is that the replicas have values identical to that of the primary, instead of evaluating the time and date when the replica ran the query.
Requirements for initial database synchronization in a replicated environment
- To make all global data identical across all replicated nodes, back up the data on the replicas and restore it on the primary. This ensures that no duplicate global objects exist and that the primary has a copy of every global object in its catalog. For more information, see Preparing the primary for new replicas.
- On the primary node, take a collection of backups for each replicated database, and use those backups to restore the database on all the replica nodes. This step ensures that every node has identical values for all replicated databases.
Starting from identical database contents, by-SQL replication keeps the replicas identical with the primary. For example, an update WHERE clause always selects rows on the replicas that are identical to the rows that are selected on the primary. In cases where nondeterministic results are permitted, such as for the DATE/TIME function, by-value replication keeps the databases synchronized.