Indexing special characters
During tokenization and language processing, Db2® Text Search identifies and indexes special characters as punctuation.
- Jack_jones@ibm.com is tokenized as jack _ jones @ ibm . com
- http://www.ibm.com is tokenized as http :// www . ibm . com
Special characters do not occupy a token position in the file. For example, "jack_jones" is indexed with the underscore in the same token position as "jack". Special characters also do not occupy a token position when spaces are included. For example, "jack_jones" is indexed in the same way as "jack _ jones".
The token position is used for exact phrase search and for proximity search. For example, if a document contains the expression jack_jones, searching for the exact phrase ""jack jones"" finds this document.
When a sequence of special characters are indexed separately, they are searched in no particular order. For example, searching for "#$" also finds documents that contain "$#".