DB2 10.5 for Linux, UNIX, and Windows

Configuration

Net Search Extender is able to search for words which may have characters used in different combinations, for example, alphanumerics, numbers, and special characters.

To do this, Net Search Extender provides the following configurations:

Character normalization
Character normalization ensures that words that can be written in two ways can be both searched for. For example, the German word 'Überbau' can also be written as 'Ueberbau'. Normalization ensures that both words can be searched for, by using either 'Überbau' or 'Ueberbau'. The functionality also normalizes accented letters, for example, 'accès' to the matching simple character, for example, 'acces'. Note that the use of this option can have undesired results in languages where for example the character 'Ü' does not have an equivalent standard normalization as 'Ue'
Using specific characters as part of a word
Using specific characters as part of a word ensures that product names which can involve a series of alphanumeric characters, special characters and numbers can be searched on as a single word. For example, by treating the alphanumeric combination 'DT9' as one word, or by enabling the '/' special character, so that OS/390® are searched on as whole words rather than as 'OS' and '390'.

For these configuration settings, switches are available. To customize the switches, change the .ini file template before creating an index.

The .ini file template is stored in sqllib/db2ext/cteixcfg.ini. As you can also make changes to most of the values in this template file using the CREATE INDEX command, it is recommended that you only change the following values:
AccentRemoval	(for character normalization)
UmlautNormalization (for character normalization)
TreatNumberAsWords (for treating numeric characters as part of the word)
AdditionalAlphanumCharacters	(for using specific characters as part of a word)
AccentRemoval
This parameter specifies if accented characters are normalized to the matching simple character. For example, événement is also indexed as evenement. The default is true.
UmlautNormalization
This parameter specifies if an umlaut character is also indexed as two characters with the same meaning. For example, 'Übersee' is also indexed as 'Uebersee'. The default is true.
TreatNumbersAsWords
This parameter specifies if numeric characters next to a word are part of the word. For example, 'DT9' is treated as one word and not as one word 'DT' and the number '9'.
AdditionalAlphanumCharacters
The string value of this parameter defines which characters are treated as part of a word. The string of special characters must be a sequence of one or more characters in UTF-8. The default string contains the characters "/-@".

You are not allowed to use the wildcard characters % and _ in the list of characters that are treated as being part of a word. This results in problems during query execution.

If you want to change any of these configuration values, edit the .ini file before you create your index. To activate inactive switches, remove the comment marker ";" from the beginning of the line. For further information, see the cteixcfg.ini file.

You are recommended not to alter any of the other values in the .ini file.