Indexing
Indexing is the process whereby data prepared by the Converters is made available to be searched. Indexing and searching are both controlled by the indexer-service. As the crawler crawls, it will find new or modified data which it processes and then forwards to the indexer. The indexer prepares this data for searching. When the data has been processed and committed to an index, the new and/or updated information will appear in search results. An index is a read-only disk based data structure that is used to respond to searches. What follows is an explanation of the various processes involved in processing the data. The data is received by the indexer and passed to a builder thread which processes the Watson™ Explorer Engine XML (VXML) in the form of documents and contents. The indexed information is initially stored in RAM in buffers. As the buffers fill up, the newly indexed information is stored in segments on disk.
In Watson Explorer Engine version 10.0.0.2 and greater, you can add a lexical analysis language stream to a collection. Lexical analysis language streams use PEAR files developed using the analytical components of Watson Explorer. Pre-built PEAR files are available for 17 languages: Arabic, Chinese, Czech, Danish, Dutch, French, German, Greek, Hebrew, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, and Turkish. See Lexical Analysis Streams for more information.