What languages are available?

Classification Workbench supports full natural language processing for many languages.

The following languages are supported:

In addition, a Generic Language option is provided for basic processing of texts in unsupported or partially supported languages such as Greek, Danish, and Polish. This option is only available for monolingual knowledge bases; it cannot be selected for use with other languages.

Classification Workbench supports language identification (that is, identifies the language of a text) for the following languages:

Table 1. Supported language identification
Language ASCII encoding Unicode encoding
Arabic Windows-1256 UTF-16
Bulgarian Windows-1251 UTF-16
Chinese (simplified) csISO58GB231280 UTF-16
Chinese (traditional) Big5 UTF-16
Czech Windows-1250 UTF-16
Danish Windows-1252 UTF-16
Dutch Windows-1252 UTF-16
English Windows-1252 UTF-16
Estonian Windows-1257 UTF-16
Farsi Windows-1256 UTF-16
Finnish ISO-8859-1 UTF-16
French Windows-1252 UTF-16
German Windows-1252 UTF-16
Greek Windows-1253 UTF-16
Hebrew Windows-1255 UTF-16
Hindi iscii-dev UTF-16
Hungarian Windows-1250 UTF-16
Italian Windows-1252 UTF-16
Japanese EUC-JP and Shift_JIS UTF-16
Korean korean UTF-16
Latvian Windows-1257 UTF-16
Lithuanian Windows-1257 UTF-16
Norwegian Windows-1252 UTF-16
Polish ISO-8859-2 UTF-16
Portuguese Windows-1252 UTF-16
Romanian Windows-1250 UTF-16
Russian Windows-1251 and KOI8-R UTF-16
Slovak Windows-1250 UTF-16
Slovenian Windows-1250 UTF-16
Spanish Windows-1252 UTF-16
Swedish Windows-1252 UTF-16
Thai tis-620 UTF-16
Turkish Windows-1254 UTF-16

The character set of each language you want to use must be installed on your computer. For Chinese (traditional), Chinese (simplified), Japanese, and Korean, the default language of your operating system must be set to the appropriate language. Additional software might be required to enter text in these languages; ask your system administrator for more information.

Note the following:

For information about creating a monolingual knowledge base in any supported language, or a multilingual knowledge base in any combination of supported languages, see Language support.