Classification Workbench supports full natural language processing for many languages.
The following languages are supported:
In addition, a Generic Language option is provided for basic processing of texts in unsupported or partially supported languages such as Greek, Danish, and Polish. This option is only available for monolingual knowledge bases; it cannot be selected for use with other languages.
Classification Workbench supports language identification (that is, identifies the language of a text) for the following languages:
Language | ASCII encoding | Unicode encoding |
---|---|---|
Arabic | Windows-1256 | UTF-16 |
Bulgarian | Windows-1251 | UTF-16 |
Chinese (simplified) | csISO58GB231280 | UTF-16 |
Chinese (traditional) | Big5 | UTF-16 |
Czech | Windows-1250 | UTF-16 |
Danish | Windows-1252 | UTF-16 |
Dutch | Windows-1252 | UTF-16 |
English | Windows-1252 | UTF-16 |
Estonian | Windows-1257 | UTF-16 |
Farsi | Windows-1256 | UTF-16 |
Finnish | ISO-8859-1 | UTF-16 |
French | Windows-1252 | UTF-16 |
German | Windows-1252 | UTF-16 |
Greek | Windows-1253 | UTF-16 |
Hebrew | Windows-1255 | UTF-16 |
Hindi | iscii-dev | UTF-16 |
Hungarian | Windows-1250 | UTF-16 |
Italian | Windows-1252 | UTF-16 |
Japanese | EUC-JP and Shift_JIS | UTF-16 |
Korean | korean | UTF-16 |
Latvian | Windows-1257 | UTF-16 |
Lithuanian | Windows-1257 | UTF-16 |
Norwegian | Windows-1252 | UTF-16 |
Polish | ISO-8859-2 | UTF-16 |
Portuguese | Windows-1252 | UTF-16 |
Romanian | Windows-1250 | UTF-16 |
Russian | Windows-1251 and KOI8-R | UTF-16 |
Slovak | Windows-1250 | UTF-16 |
Slovenian | Windows-1250 | UTF-16 |
Spanish | Windows-1252 | UTF-16 |
Swedish | Windows-1252 | UTF-16 |
Thai | tis-620 | UTF-16 |
Turkish | Windows-1254 | UTF-16 |
The character set of each language you want to use must be installed on your computer. For Chinese (traditional), Chinese (simplified), Japanese, and Korean, the default language of your operating system must be set to the appropriate language. Additional software might be required to enter text in these languages; ask your system administrator for more information.
Note the following:
For information about creating a monolingual knowledge base in any supported language, or a multilingual knowledge base in any combination of supported languages, see Language support.