Rebasing Synchronized Indices

Watson™ Explorer Engine synchronized indexing uses a technique called rebasing to synchronize indices between Watson Explorer Engine installations that are members of a synchronization set. Borrowed from the world of revision or source code control systems, rebasing synchronizes multiple systems by propogating changes from one server to one or more others. When using synchronized distributed indexing, rebasing enables one system's copy of a distributed index to become consistent with that on another system by creating a new, empty copy of the index and then receiving all of the updates that have been made to the master copy of the index.

Once multiple systems that are members of a synchronization set are consistent with each other, they remain synchronized by propogating updates to that index to all other members of the synchronization set. Once updates have been received by all members of the synchronization set, they are deleted from the journal used to track pending index updates.

warning: When using distributed indexing, cascading rebase operations are not supported. For example, if the data from collection A is to be used in collection B, C, and D, the correct method to rebase is to directly rebase from A to B, A to C, and A to D. Rebasing from A to B, then from B to C, and finally C to D is not a supported rebasing model.

Rebasing is especially useful when adding new systems to a synchronization set. When new systems are added to a synchronization set, they initially become consistent by rebasing from a specified server. Once they are consistent, remaining consistent requires relatively little network traffic because they simply receive index updates like any other member of the synchronization set.

As mentioned in the previous section, synchronized distributed indexing is especially well-suited to frequently-updated indices, such as those used in collaborative search applications, because they can remain consistent by only exchanging updates. Synchronization is also well-suited to lower-speed, higher traffic, or less robust networking environments both because less data is exchanged between members of a synchronization set than in the other distributed indexing models that are supported by Watson Explorer Engine, and because updates are exchanged transactionally, and are therefore automatically retried if they fail initially.

In order to rebase, the crawler must be running on both the server from which updates are being received and the client that is receiving them.

Tip: If rebasing one system from another takes longer than the Time to disconnection setting (default value, 120 seconds), the rebase can time out. You may need to set this option, located in the Remote common section of the Remote tab for the new member, to a large value when initially performing a rebase, and can subsequently reduce this value for normal operations.