Pipeline phase
The following components are involved
in the pipeline phase: the XML filtering utility and the IBM® Content Search Services index
server.
Configure the operation of a Content Search Services index server to optimize indexing performance or control system resources.
The following table describes the steps in the pipeline phase.
| Step | Description | Related information |
|---|---|---|
| 1. Input queue | When received by an index server, the text document is placed in an input queue for the index server. | |
| 2. Document preprocessing | XML filtering: The XML filtering utility optionally removes surplus XML elements from XML content. | For information about defining surplus XML elements, see Setting XML elements as non-searchable. |
| Language identification: The index server identifies the text language for the text document. | For information about selecting text languages, see Selecting text languages or text analyzers for an object store. | |
| Tokenization: The server creates tokens for the text document based upon a language-aware analysis of the text. Word stems and other language constructs are identified. | For information about word stems, see Token searches: Language-aware versus exact-match. | |
| 3. Output queue | After preprocessing, the text document is sent to the output queue for the index server. | |
| 4. Token indexing | The index entry for the object in the target index is updated with the tokens. |