GitHubContribute in GitHub: Open doc issue|Edit online

Delta feature for Iterator mode

Connectors in Iterator mode can generate Delta entries. This feature uses the Delta Engine and the Delta Store to detect changes.

Delta Engine

The Delta Engine allows you to read through a data source, and detect changes from the previous time you did this. This way you can detect new entries, changed entries and even deleted entries. For certain data sources (such as LDIF files and LDAP servers), IBM® Security Verify Directory Integrator can even detect if attributes and values within entries have been changed. You can configure Delta settings on Connectors in Iterator mode only.

The Delta Engine knows whether Entries or Attributes have been added, changed or deleted by keeping a local copy of each Entry in a persistent store, which is part of the System Store. This local repository is called a Delta Store and consists of Delta tables. Each time the AssemblyLine is run, the Delta Engine compares the data being iterated with its copy in the Delta Table. When a change is detected the Connector returns a Delta Entry.

Note: Do not manually modify Delta Store tables. Otherwise, the Delta snapshot information will become inconsistent, and the Delta Engine will fail.

Note: In versions earlier than IBM® Security Verify Directory Integrator V6.1, snapshots written to the Delta Store during Delta engine processing were committed immediately. As a result, the Delta engine would consider a changed Entry as handled even though processing the AL Flow section failed. This limitation is addressed through the Commit parameter on the Connector Delta tab. Setting this parameter controls when the Delta engine commits snapshots taken of incoming data to the System Store.

Unique Attribute name

In order for the Delta mechanism to be able to uniquely identify each Entry, you must specify a unique Attribute to use as a Delta key. The values of this attribute must be unique in the used data source. You can specify the Delta key in the Delta tab of the Connector, by entering or selecting an attribute name in the Unique Attribute Name parameter. This attribute must be found in the Input Map of your Iterator, and can either be an attribute read from the connected system or a computed attribute (using script in the Attribute Mapping).

You can also specify multiple attributes by separating them with a plus sign ( + ):

LastName+FirstName+BirthDate 

At least one of the attributes specified in the Unique Attribute Name parameter must contain a value. When several attributes are specified, their string values are concatenated into one string, which then becomes the unique Delta identifier. Attributes with no values (for example, blank or NULL) are skipped when the Delta key is built for an Entry.

Delta Store

The Delta Store is physically located in the System Store. It consist of one Delta Systable (DS) and one or more Delta Tables. Each Delta Table is used for the Delta Store of a different Iterator Connector with enabled Delta.

Although Delta Store tables can be accessed with both the JDBC Connector and the System Store Connector, it is unadvisable to change them without a deep understanding of how these tables are structured and handled by the Delta Engine.

Delta Table structure

Every Delta Table (DT) contains information about each Entry processed by the Delta Engine for a particular Connector. A Delta Systable (DS) maintains a list of all Delta Tables currently in use by the Delta Store.

  • Delta Systable – The Delta Systable (DS) contains information about each Delta Table (DT) in the System Store. The purpose of the DS is to maintain the sequence counter for each DT. The structure for the DS is as follows: | Column | Type | Description |

    | ID | Varchar | The DT identifier (name) | | SequenceID | Int | The sequence ID from the last run | | Version | Int | The DS version (1) |

    Table 1. Delta Systable structure

  • Delta Table – Each Connector that requests a Delta store needs to specify a unique Delta identifier to be associated with the Connector. This identifier is also used as the name of the Delta Table in the System Store. The Delta Table structure is as follows: | Column | Type | Description |

    | ID | Varchar | The unique value identifying an Entry | | SequenceID | Int | The sequence number for the Entry | | Entry | Long Varbinary | The serialized Entry object |

    Table 2. Delta Table structure

Delta process

Given the above Delta Store structure, the sequence number is used to determine which entries are no longer part of the source data set. Every time an AssemblyLine is run the sequence number for the Delta Table used in particular by the Connector is read from the Delta Systable. Then it is incremented, and this incremented value will be used for marking the updated entries during the entire AssemblyLine execution.

The Delta Engine process works in two passes.

  1. Read → Look up → Compare → Update → Set current SequenceID
    1. The Iterator reads entries from the input data source.
    2. The Delta process looks for corresponding Entry in the Delta Table using the unique attribute's value.
    3. If a match is found the Delta process compares each Attribute (and its values) to determine if there have been modifications to the Entry. Based on the result from the comparison, the Delta Engine returns Delta Entry tagged with the relevant operation codes: modify or unchanged:
      • Modify Entry – the Entry that was read and the corresponding Entry from the Delta Table are considered different; the Entry is updated in the Delta Table
      • Unchanged Entry – the Entry that was read and the corresponding Entry from the Delta Table are considered equal.
    4. If a match is not found in the Delta Table the Entry is treated as new:
      • Add Entry – the Entry is added to the Delta Table.
    5. In both case c. and d. the sequence number value in the Delta table is updated with the sequence number used for the current AssemblyLine execution.
  2. Check for data with (SequenceID < current SequenceID) → Mark as Deleted Once End of Data is reached by the Iterator, the Delta Engine makes a second pass through the Delta Table looking for those entries not accessed during the first pass. These Entries are easily recognized because their sequence number is not updated with the current sequence number. Therefore any Entries in the Delta Table with a sequence number lower than the current sequence number are considered to be deleted entries and are returned as deleted. Note: This pass happens only when the iteration through the input data completes successfully. If for some reason an error occurs during that iteration, no Entries will be tagged as deleted and returned by the AssemblyLine or removed from the Delta Table. This will not affect the original data source and the next time the AssemblyLine is executed successfully the deleted Entries will be processed correctly.

Row Locking

This parameter is available in the Delta tab for Iterator connectors and the Delta Function Component configuration. It allows you to set the transaction isolation level used by the connection established to the Delta Store database. Setting a higher isolation level reduces the transaction anomalies known as 'dirty reads', 'non-repeatable reads' and 'phantom reads' by using row and table locks. This parameter has the following values:

READ_UNCOMMITTED
Corresponds to java.sql.Connection.TRANSACTION_READ_UNCOMMITTED; indicates that dirty reads, non-repeatable reads and phantom reads can occur. This level allows a row changed by one transaction to be read by another transaction before any changes in that row have been committed (a "dirty read"). If any of the changes are rolled back, the second transaction will have retrieved an invalid row.

READ_COMMITTED
Corresponds to java.sql.Connection.TRANSACTION_READ_COMMITTED; indicates that dirty reads are prevented; non-repeatable reads and phantom reads can occur. This level only prohibits a transaction from reading a row with uncommitted changes in it.

REPEATABLE_READ
Corresponds to java.sql.Connection.TRANSACTION_REPEATABLE_READ; indicates that dirty reads and non-repeatable reads are prevented; phantom reads can occur. This level prohibits a transaction from reading a row with uncommitted changes in it, and it also prohibits the situation where one transaction reads a row, a second transaction alters the row, and the first transaction rereads the row, getting different values the second time (a "non-repeatable read").

SERIALIZABLE
Corresponds to java.sql.Connection.TRANSACTION_SERIALIZABLE; indicates that dirty reads, non-repeatable reads and phantom reads are prevented. This level includes the prohibitions in TRANSACTION_REPEATABLE_READ and further prohibits the situation where one transaction reads all rows that satisfy a WHERE condition, a second transaction inserts a row that satisfies that WHERE condition, and the first transaction rereads for the same condition, retrieving the additional "phantom" row in the second read. This is generally the slowest but safest option, and the default value for the Row Locking parameter.

For more information about transaction isolation levels, see the online documentation of the java.sql.Connection interface: http://docs.oracle.com/javase/1.6.0/docs/api/java/sql/Connection.html.

Each database server sets a default transaction isolation level; the default value for Apache Derby, Oracle and Microsoft SQL Server is TRANSACTION_READ_COMMITTED. However, the default value of the Row Locking parameter of SERIALIZABLE will override this when using a Delta component (that is, the Delta functionality in Iterator Connectors or the Delta Function Component).

Some database servers may not support all transaction isolation levels, therefore please refer to the specific database documentation for accurate information about supported transaction isolation levels.

Note: Transaction isolation levels are maintained by the database server itself for every connection established to the database. Therefore when a Delta component (with Transaction isolation level set to REPEATABLE_READ or SERIALIZABLE and the Commit parameter set to On Connector Close starts its transaction, all other queries trying to modify the same data will be blocked. This means that other components which need to modify the same data will have to wait until the first component commits its transaction on termination. This waiting may cause the issued SQL queries to timeout and leave the data unmodified.

Also when a component has the Commit parameter set to No autocommit you should manually commit the transactions in such manner that other components will not wait forever to perform a modification.

Detect or ignore changes only in specific attributes

The parameters Attribute List and Change Detection Mode configure the ability of the Delta Engine to detect changes only in specific attributes instead of in all received attributes.

The Attribute List parameter is a list of comma separated attributes which will be affected by Change Detection Mode. This Change Detection Mode parameter specifies how changes in these attributes will be handled. It has three values:

IGNORE_ATTRIBUTES
(“Ignore changes for the following Attributes”) – Changes in every attribute specified in the Attribute List parameter will be ignored during the compute changes process.

DETECT_ATTRIBUTES
(“Detect changes for the following Attributes”) – This option has the opposite effect – the only detected changes will be in the attributes listed in the Attribute List parameter.

DETECT_ALL
(“Use all Attributes for change detection”) – This instructs the Delta Engine to detect changes in all attributes. When this option is selected the Attribute List parameter is disabled since no list of affected attributes is needed.

Example use case

When using the Delta Engine, sometimes the received entries contain attributes that you consider as not important and want to ignore. In such cases, these attribute must not affect the result of the Delta computation, as when several Entries differentiate only by these attribute it leads to unnecessary updates of the Delta Store table.

The solution for this case is using the Attribute List and Change Detection Mode parameters

Here is an example scenario where two AssemblyLines are receiving changelog entries from two replicas of a LDAP server and these changes are applied to one Delta Store. To illustrate this we will use the following example changelog entries:

Entry1:

Entry attributes:
    targetdn (replace): 'cn=Niki,o=IBM,c=us'
    changetime (replace):   '20071015094646'
    $dn (replace):  'changenumber=78955,cn=changelog'
    ibm-changeInitiatorsName (replace): 'CN=ROOT'
    changenumber (replace): '78955'
    objectclass (replace):  'top'   'changelogentry'    'ibm-changelog'
    changetype (replace):   'modify'
    cn (replace):   'Niki' 'Niky'
    changes (replace):  'replace: cn
                      cn: Niki
                      cn: Niky
                      -
                      '

Entry2:

Entry attributes:
    targetdn (replace): 'cn=Niki,o=IBM,c=us'
    changetime (replace):   '20071015094817'
    $dn (replace):  'changenumber=10076,cn=changelog'
    ibm-changeInitiatorsName (replace): 'CN=ROOT'
    changenumber (replace): '10076'
    objectclass (replace):  'top'   'changelogentry'    'ibm-changelog'
    changetype (replace):   'modify'
    cn (replace):   'Niki' 'Nikolai'
    changes (replace):  'replace: cn
                      cn: Niki
                      cn: Nikolai
                      -
                     '

Entry3:

Entry attributes:
    targetdn (replace): 'cn=Niki,o=IBM,c=us'
    changetime (replace):   '20071037454817'
    $dn (replace):  'changenumber=112,cn=changelog'
    ibm-changeInitiatorsName (replace): 'CN=ADMIN'
    changenumber (replace): '112'
    objectclass (replace):  'top'   'changelogentry'    'ibm-changelog'
    changetype (replace):   'modify'
    cn (replace):   'Niki' 'Nikolai'
    changes (replace):  'replace: cn
                      cn: Niki
                      cn: Nikolai
                      -
                         '

Modified attributes are marked in bold and attributes that can be ignored are marked in italics. The ignored attributes (such as changenumber, changetime, and so forth) will not be considered when comparing the received Entry with the stored Entry. Therefore these attributes have to be listed in the Attribute List parameter. In order to specify that we want to ignore them the Change Detection Mode parameter needs to be set to Ignore changes for the following Attributes.

This is the workflow when the AssemblyLines receive the entries:

  1. When AL1 receives Entry1, it will be returned as modify and saved in the Delta Store table.
  2. When AL2 receives Entry2 , its changetime, $dn, bm-changeInitiatorsName, changenumber attributes are modified but will be ignored. However the cn and changes attributes are also modified and therefore the resulted Delta Entry will be tagged as modify and saved in the Delta Store table.
  3. When AL2 receives Entry3, its changetime, $dn, bm-changeInitiatorsName, changenumber attributes are modified but will be ignored. The rest of the attributes are equal so the resulted Delta Entry will be tagged as unchanged and will be returned to the AssemblyLine (only if the Return unchanged parameter is checked) or skipped. The returned Delta Entry will be identical to the received Entry3. In this case the Delta Store is not updated. If the Attribute List and Change Detection Mode parameter were not used, the last Entry3 would have been tagged as modify and saved in the Delta Store.