Linguistic processing

The OmniFind Text Search Server for DB2® for i provides dictionary packs to support the linguistic processing of documents and queries that are not in English.

As an alternative to dictionary-based word segmentation, the OmniFind Text Search Server for DB2 for i uses n-gram segmentation support for languages such as Chinese, Japanese, and Korean. n-gram segmentation is a method of analysis that considers overlapping sequences of a given number of characters as a single word. Alternatively, Unicode-based white-space segmentation uses blank space to delimit words.

If a text document is in one of the supported languages, linguistic processing is carried out when the text is parsed into tokens. For unsupported languages, an error code is returned.

When you search a text search index, a match is indicated that contains linguistic variations of the query terms. The variations of a word depend on the language of the query.