Net Search Extender is
able to search for words which may have characters used in different
combinations, for example, alphanumerics, numbers, and special characters.
To do this, Net Search Extender provides
the following configurations:
- Character normalization
- Character normalization ensures that words that can be written
in two ways can be both searched for. For example, the German word
'Überbau' can also be written as 'Ueberbau'. Normalization ensures
that both words can be searched for, by using either 'Überbau' or
'Ueberbau'. The functionality also normalizes accented letters, for
example, 'accès' to the matching simple character, for example, 'acces'.
Note that the use of this option can have undesired results in languages
where for example the character 'Ü' does not have an equivalent standard
normalization as 'Ue'
- Using specific characters as part of a word
- Using specific characters as part of a word ensures that product
names which can involve a series of alphanumeric characters, special
characters and numbers can be searched on as a single word. For example,
by treating the alphanumeric combination 'DT9' as one word, or by
enabling the '/' special character, so that OS/390® are searched on as whole words rather
than as 'OS' and '390'.
For these configuration settings, switches are available.
To customize the switches, change the .ini file template
before creating an index.
The
.ini file template
is stored in
sqllib/db2ext/cteixcfg.ini. As you
can also make changes to most of the values in this template file
using the
CREATE INDEX command, it is recommended
that you only change the following values:
AccentRemoval (for character normalization)
UmlautNormalization (for character normalization)
TreatNumberAsWords (for treating numeric characters as part of the word)
AdditionalAlphanumCharacters (for using specific characters as part of a word)
- AccentRemoval
- This parameter specifies if accented characters are normalized
to the matching simple character. For example, événement is also indexed
as evenement. The default is true.
- UmlautNormalization
- This parameter specifies if an umlaut character is also indexed
as two characters with the same meaning. For example, 'Übersee' is
also indexed as 'Uebersee'. The default is true.
- TreatNumbersAsWords
- This parameter specifies if numeric characters next to a word
are part of the word. For example, 'DT9' is treated as one word and
not as one word 'DT' and the number '9'.
- AdditionalAlphanumCharacters
- The string value of this parameter defines which characters are
treated as part of a word. The string of special characters must be
a sequence of one or more characters in UTF-8. The default string
contains the characters "/-@".
You are not allowed to use the wildcard
characters % and _ in the list of characters that are treated as being
part of a word. This results in problems during query execution.
If you want to change any of these configuration
values, edit the .ini file before you create your
index. To activate inactive switches, remove the comment marker ";"
from the beginning of the line. For further information, see the cteixcfg.ini file.
You
are recommended not to alter any of the other values in the .ini file.