Workload and Db2 database considerations
Because replication uses log-capture/transaction-replay technology on sources and targets that are independent Db2 systems, it requires a few considerations to your workloads and databases.
Log-capture/transaction-replay technology detects changes that are made to a source database (asynchronously and with no impact to the source application), by capturing log records from the database recovery logs and reconstructing SQL statements for replaying transactions at the target.
The SQL replay requires your tables to have a unique constraint to enforce the uniqueness of each row when replication applies the row to target tables. Depending on the characteristics of your database and workloads, you might also need to carry out some administrative changes outside of replication.
The following sections describe replication-related constraints that might need to be addressed for your workloads and databases.
- Overview
- Unique constraints
- Referential-integrity constraints
- Tables with identity columns
- Database sequences
- Non-logged operations
- Incremental load and partition load
- Tables with triggers that update other tables
- Restrictions
Overview
With log-capture technology, only database operations for which there is some DBMS logging can be replicated. Replication can automatically replicate the following operations:
- DML: INSERT/UPDATE/DELETE
- DDL: See Supported DDL operations in Db2.
- TRUNCATE IMMEDIATE
Replication can detect the invocation of the Db2 LOAD utility on a source table that is replicated and re-initialize the replication subscription for that table. When the subscription is defined with initial load, the entire table is automatically re-loaded at the target side by using external table load for column-organized tables, or using a Db2 LOAD from cursor for row-organized tables. Changes are captured during the re-load and therefore the application can continue updating the source table.
Important: Subscription re-initialization is for the entire table, even if LOAD APPEND was performed on the source table. To replicate incremental LOAD instead of reloading the entire table, you must load the table using external table load (which is a logged operation) or inserts, instead of using the LOAD utility.
For DDL changes replication tolerates situations where the change was already made by the user at the target system, in which case replication simply updates its metadata. You can also temporarily stop replication when doing DDL changes, keeping the target down-level for fallback in case any issues arise during the application upgrade, and then restart replication after validation is complete to let replication update the target system with the same DDL changes.
In general, DDL changes that are needed by replication to replay transactions, such as new columns in a replicated table, are recognized and by default automatically replicated. Changes that are only needed by the application or specific to the database configuration, for example creating a view, or requesting database configuration changes, are not currently replicated. Such changes must be carried out for each database by the user. Because the databases are independent from each other and each database is fully active, they can run different applications and can be configured differently. This allows you to stage upgrades and keep one of the systems down-level for fallback.
Some DDL operations can be replicated without any user intervention and without any interruption to the replication process and application workloads. For LOAD utilities, replication gives the option to perform a full-refresh of the target table.
With log capture/transaction-replay technology, the SQL statements that are executed at the target are equivalent, but not necessarily identical to, those that are issued by an application against the source database. SQL statements at the target database reflect what was logged, not what the user issued at the source. This method has huge benefits because you can replicate what was successfully committed at the source, as opposed to what an application might have attempted to run. For instance, it allows you to replicate every action of a stored procedure, whereas replaying the same stored procedure at the target could give a different outcome. It also guarantees that values that are generated by Db2 at the source, such as timestamps, dates, or generated always columns, are identical on both the source and target systems.
Unique constraints
For both row-organized and column-organized tables, unique constraints should exist on the table:
- For row-organized tables, the unique constraint should be enforced and have an associated index.
- For column-organized tables, the unique constraint does not have to be enforced (an index is not
required), but the unique constraint must be respected by the application. Db2 does
not check for duplicate rows.
If a column-organized table does not have a unique constraint or any set of columns that can uniquely identify each row, the recommended solution is to add an identity column to the table with a unique constraint on this column. Db2 then generates a unique value for each row that is inserted into this table.
If you must add a unique constraint to an existing table that is already in use in a multi-partition database, you must deactivate and re-activate the database for Db2 to take the constraint into consideration across all partitions and avoid performance degradation for delete operations on that table. Failure to do that can cause serious performance degradation when deleting a large number of rows on a column-organized table.
The following query can be used to identify tables that do not have a unique index:
with unitab(tabname) as(
select distinct tabname from syscat.indexes
where
tabschema IN ('YOURSCHEMA') and not tabname like 'CD%'
and uniquerule in ('P','U')
)
select
SUBSTR(TABSCHEMA,1,12) AS TABSCHEMA,
char(t.tabname,30) tabname,
case
when u.tabname is not null then 'No problem'
when u.tabname is null then 'No unique Index!!'
end
from syscat.tables t left join unitab u on t.tabname = u.tabname
where
t.tabschema IN ('YOURSCHEMA') AND t.type ='T' and not t.tabname like 'CD%'
ORDER BY 3,1,2 desc;
Referential-integrity constraints
Tables that are related by foreign key constraints must be replicated in the same replication set. You must active replication for the parent table before you activate any child tables.
Tip: For a faster restart of a system after a prolonged outage during which replication had to be stopped and a huge backlog of changes accumulated, drop RI constraints at the site that is being brought back online for faster resynchronization. Restore the RI constraints after replication has cleared the backlog of changes and before rerouting applications to that site.
Tables with identity columns
For identity columns the range of values that are generated must not overlap between the source and the target databases. For example assume the following table with an identity column called ID:
CREATE TABLE CUSTOMER (ID SMALLINT
GENERATED BY DEFAULT AS IDENTITY(START WITH 100 INCREMENT BY 1),
NAME VARCHAR(30));
Follow these recommendations:
- Use an odd-even scheme if you are replicating between two sites, or different ranges at each
site. For example, if you use ranges, alter the table at the
target:
In this example, values 1 to 15999 will be used at site 1, and values 16000 and greater at site 2.ALTER TABLE CUSTOMER ALTER COLUMN ID RESTART WITH 16000
- When you use replication in an active/standby configuration (where read-write operations against
the same data are allowed only on one site until failover to the standby), write a script to alter
identity columns at the standby site to start from MAX(identity-column-name)+1 before failover of
the application to the standby site. For
example:
SELECT MAX(ID)+1 FROM CUSTOMER ALTER TABLE CUSTOMER ALTER COLUMN ID RESTART WITH :val
If you cannot define identity columns to generate non-overlapping values across sites, a reset is required before an application site switch.
Database sequences
Sequences are database objects that you can use to generate values for insert or update into
tables. For example, insert into t2(col1) values(next value for myseq)
.
For example, the source database might be a Db2 system in Montpelier, France and the target database a Db2 system in San Jose, California. Database independence has implications for any values that are automatically generated by the Db2 system, such as database sequences. Because there is no coordination for generating these values between the two independent database systems, the next value that is available for a sequence in Montpelier is not the same as the next value that is available in San Jose. If you update tables that use such sequences at each site and replicate in both directions, a policy is required for preventing value overlaps.
Follow these recommendations:
- Partition sequence generation between sites, using either an odd/even scheme or different value ranges, so that values do not overlap. A sequence that can generate overlapping values cannot be used in an active/active configuration without a high risk of conflicts.
- Active/standby configurations: In situations where the application will failover in case
of a disaster or for maintenance, alter the sequence to max+1 before rerouting the application, for
example:
ALTER SEQUENCE ORG_SEQ RESTART SELECT NEXT VALUE FOR ORG_SEQ, ORG.* FROM ORG
.
Sequences might require a reset before an application site switch.
Non-logged operations
Database operations that are not logged, such as loading data with the LOAD utility and tables that specify NOT LOGGED INITIALLY, require special consideration. replication can detect the following types of LOAD utilities and trigger an automatic reinitialization of the table:
- LOAD REPLACE
- LOAD INSERT
Tip: The capture program cannot detect third-party load utilities. If you want load operations to be handled automatically by replication, use IBM® utilities. Otherwise, stop replication for the table and restart it after you have reloaded the table.
Incremental load and partition load
LOAD INSERT cannot be replicated automatically. Only a full refresh can be performed automatically by replication. That is, even if a single partition is loaded, the entire table is reloaded when CAPTURE_LOAD=R and the replication set is defined to use an automatic load.
Running applications simultaneously that update the same data on both sites
In an active-active configuration, you should not let applications update the same row before replication catches up. If this happens, a replication conflict is detected and reported and the change is forced by default. This situation generally leads to inconsistent data across sites. A conflict means that a user-committed change is lost and conflict resolution can never guarantee the correct result from a business point of view. It is the responsibility of the user to ensure that applications are properly routed to each database in order to prevent conflicts. When switching an application between databases, first verify from the console that the consistency point was reached at the time when the application stopped updating the source site.
Note that insert-only workloads never conflict and can be routed to all sites.
There is a special case with conflicts on tables with multiple unique indexes in an active/active environment. Tables with both a primary key and one or more secondary unique indexes can be replicated without any problem for the majority of cases, but there can be issues when both the primary key and one of the secondary keys are updated simultaneously at two sites. For example:
SITE A: (1,2) SITE B: (1,2)
SITE A: insert (2,3) SITE B: insert (5,3)
----replication sends the rows in each direction ----
SITE A: insert (5,3) ---> ERROR: SQL803 (duplicate row) Q Apply
follows ERROR_ACTION
SITE B: insert (2,3) ---> ERROR: SQL803 (duplicate row) Q Apply
follows ERROR_ACTION
Tables with triggers that update other tables
An application might rely on a trigger to insert data into Db2. For example, you might have an insert trigger on a table that generates a timestamp and inserts a row into another table with that timestamp.
If both tables are added to the replication set and the trigger does not exist at the target, both the insert into the parent table and the insert into the dependent table are replicated, keeping in our example the same timestamp value.
The implication is that prior to rerouting your application to the target site, you need to restore the trigger.
If you prefer to have the trigger in place, a service request can be made to IBM to change the replication configuration so that operations that are the result of a trigger are not replicated, relying on the trigger being fired at the target instead, when the insert into the parent table is replicated. If no Db2-generated values are triggered (for example, no timestamp) this might be a preferred approach because you do not need to drop the trigger form the target and restore it prior to rerouting your application that depends on this trigger.
Restrictions
- Multi-row SQL update statements on tables with an enforced unique constraint for which Db2 has to defer a uniqueness check to the end of the statement cannot be replicated
without database or application changes. Multi-row update statements that update one or more columns
with a unique constraint cannot be replicated, if and when the execution of this statement causes
temporary duplicate values. An example of this type of statement would be an update of "k=k+1" on a
table that already contain rows with the values 1 and 2 for k. This is because multiple-row
statements are replayed as independent statements at the target as they were captured from the log
at the source. If any of the individual statement fails, the error action that was defined for the
replication set is followed. A database workaround is to add another unique index to this table and
use this index as the replication key, or if possible, remove the unique constraint at the
target.
Tip: To generate a unique key when none can be generated from existing columns in the table, you can use a sequence. The following example illustrates the concept. These commands are for the Db2 command-line interface. Assume the following table is created:
create table T1(tabkey integer not null primary key);
An application updates the table by using tabkey=tabkey+1, which causes temporary duplicate values during statement execution. A solution for replicating this kind of multi-row update statement is to use another column of the table with unique values, adding one if necessary. The following example uses a sequence. First a few rows are added to demonstrate:
insert into T1 values (0),(1);
Then a new unique index and column are added:
alter table T1 add column repkey integer not null with default 0; alter table T1 alter column repkey drop default; alter table T1 alter column repkey set generated always as identity;
Then the primary key is dropped and recreated as a non-unique index:
alter table T1 drop primary key; create index tabind on T1(tabkey); update T1 set repkey = default; reorg table T1;
Then you can look at the generated values for the new index:
select * from T1;
This statement inserts a unique incremental value into each row in the table. At the target, you must also replace the primary key with a non-unique index. When you define the subscription for this table, choose the new unique key (repkey in our example) as the replication key, instead of the primary key.
- Replication is not supported on databases that use lattice-based access control (LBAC).