Rebasing Synchronized Indices
Watson™ Explorer Engine synchronized indexing uses a technique called rebasing to synchronize indices between Watson Explorer Engine installations that are members of a synchronization set. Borrowed from the world of revision or source code control systems, rebasing synchronizes multiple systems by propogating changes from one server to one or more others. When using synchronized distributed indexing, rebasing enables one system's copy of a distributed index to become consistent with that on another system by creating a new, empty copy of the index and then receiving all of the updates that have been made to the master copy of the index.
Once multiple systems that are members of a synchronization set are consistent with each other, they remain synchronized by propogating updates to that index to all other members of the synchronization set. Once updates have been received by all members of the synchronization set, they are deleted from the journal used to track pending index updates.
Rebasing is especially useful when adding new systems to a synchronization set. When new systems are added to a synchronization set, they initially become consistent by rebasing from a specified server. Once they are consistent, remaining consistent requires relatively little network traffic because they simply receive index updates like any other member of the synchronization set.
As mentioned in the previous section, synchronized distributed indexing is especially well-suited to frequently-updated indices, such as those used in collaborative search applications, because they can remain consistent by only exchanging updates. Synchronization is also well-suited to lower-speed, higher traffic, or less robust networking environments both because less data is exchanged between members of a synchronization set than in the other distributed indexing models that are supported by Watson Explorer Engine, and because updates are exchanged transactionally, and are therefore automatically retried if they fail initially.
In order to rebase, the crawler must be running on both the server from which updates are being received and the client that is receiving them.