Supported languages and code pages
You can specify that the text documents be parsed using a particular language when you first create a text search index. You can also specify that the query terms be interpreted in a particular language while searching. In addition, you can specify a code page when you create a text search index on a binary data type column.
Language specification
A locale is a combination of language and territory (region or country) information and is represented by a five-character locale code. You define the message locale for a text search administration procedure by passing the procedure the locale code. Refinements of these locale codes are possible depending on the locales installed on the Db2® server.
- The locale that you specify in your db2ts CREATE INDEX command determines the language used to tokenize or analyze documents for indexing. If you know that all documents in the column to be indexed use a specific language, specify the applicable locale when you create the text search index. If you do not specify a locale, the database territory will be used to determine the default setting for LANGUAGE. To have your documents automatically scanned to determine the locale, in the SYSIBMTS.TSDEFAULTS view, set the LANGUAGE attribute to AUTO. The SYSIBMTS.TSDEFAULTS view describes database defaults for text search using attribute-value pairs.
- The locale that you specify in a search query is used to perform linguistic processing on the query and to help identify the base forms of the query term. After the locale of the base form has been identified, the locale does not play any part in the search process itself. Thus, you could use the English language for a query and obtain German documents in the search result if the search term in its base form is present in the documents.
Locale code | Language | Territory |
---|---|---|
ar_AA | Arabic | Arabic countries or regions |
cs_CZ | Czech | Czech Republic |
da_DK | Danish | Denmark |
de_CH | German | Switzerland |
de_DE | German | Germany |
el_GR | Greek | Greece |
en_AU | English | Australia |
en_GB | English | United Kingdom |
en_US | English | United States |
es_ES | Spanish | Spain |
fi_FI | Finnish | Finland |
fr_CA | French | Canada |
fr_FR | French | France |
it_IT | Italian | Italy |
ja_JP | Japanese | Japan |
ko_KR | Korean | Korea, Republic of |
nb_NO | Norwegian Bokmål | Norway |
nl_NL | Dutch | Netherlands |
nn_NO | Norwegian Nynorsk | Norway |
pl_PL | Polish | Poland |
pt_BR | Portuguese | Brazil |
pt_PT | Portuguese | Portugal |
ru_RU | Russian | Russia |
sv_SE | Swedish | Sweden |
zh_CN | Chinese | China |
zh_TW | Chinese | Taiwan |
Code page specification
You can index documents if they use one of the supported Db2 code pages. Although specifying the code page when creating a text search index is optional, doing so helps to identify the character encoding of binary columns. If you do not specify a code page for binary columns, the code page from the column property is used. The list of supported territory codes and code pages can be found here.