Supported languages
Pretrained models
| Language name |
Language code |
Task types supported |
|---|---|---|
| All languages | Detag 1, Lang-Detect 2, Syntax (Izumo) 3 | |
| Arabic | ar | Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Relations (Transformer), Sentiment (CNN, BERT, Transformer), Target-Mentions |
| Chinese (Simplified) |
zh-cn | Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (CNN, BERT, Transformer), Target-Mentions |
| Chinese (Traditional) |
zh-tw | Entity-Mentions (RBR), Keywords, Noun-Phrases, Target-Mentions |
| Czech | cs | Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer) |
| Danish | da | Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer) |
| Dutch | nl | Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (CNN, BERT, Transformer), Target-Mentions |
| German | de | Categories, Concepts, Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Relations (Transformer), Sentiment (CNN, BERT, Transformer), Target-Mentions |
| English | en | Categories, Concepts, Emotion, Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Relations (SIRE, Transformer), Sentiment (CNN, BERT, Transformer), Target-Mentions, Tone |
| Finnish | fi | Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer) |
| French | fr | Concepts, Emotion, Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Relations (Transformer), Sentiment (CNN, BERT, Transformer), Target-Mentions, Tone |
| Hebrew | he | Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer) |
| Hindi | hi | Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer) |
| Italian | it | Concepts, Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Relations (Transformer), Sentiment (CNN, BERT, Transformer), Target-Mentions |
| Japanese | ja | Concepts, Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Relations (Transformer), Sentiment (CNN, BERT, Transformer), Target-Mentions |
| Korean | ko | Concepts, Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Relations (Transformer), Sentiment (CNN, BERT, Transformer), Target-Mentions |
| Norwegian Bokmal |
nb | Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer) |
| Norwegian Nynorsk |
nn | Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer) |
| Portuguese | pt | Concepts, Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Relations (Transformer), Sentiment (CNN, BERT, Transformer), Target-Mentions |
| Polish | pl | Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer) |
| Romanian | ro | Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer) |
| Russian | ru | Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer) |
| Slovak | sk | Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer) |
| Spanish | es | Concepts, Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Relations (Transformer), Sentiment (CNN, BERT, Transformer), Target-Mentions |
| Swedish | sv | Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer) |
| Turkish | tr | Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer) |
1 The Detag task is language agnostic.
2 Lang-Detect is supported for the languages described in List of Supported Languages below.
3 Syntax support for different parsers (sentence detection, tokenization, lemmatization, parts-of-speech and dependency parsing) is described in List of Supported Languages below.
NLP tasks
List of Supported Languages
Watson NLP supports 31 languages for core functions. Based on a study of content languages for worldwide websites, this can cover 92.6% of them.

| Language name | Locale code | Language identification | Sentence segmentation | Tokenization | PoS tagging | Dependency parsing |
|---|---|---|---|---|---|---|
| Afrikaans | af | ✓ | ✓ | ✓ | ✓ | ✓ |
| Arabic | ar | ✓ | ✓ | ✓ | ✓ | ✓ |
| Bosnian | bs | ✓* | ✓ | ✓ | ✓ | ✓ |
| Catalan | ca | ✓ | ✓ | ✓ | ✓ | |
| Chinese (Simplified) | zh_CN | ✓ | ✓ | ✓ | ✓ | |
| Chinese (Traditional) | zh_TW | ✓ | ✓ | ✓ | ✓ | |
| Croatian | hr | ✓ | ✓ | ✓ | ✓ | ✓ |
| Czech | cs | ✓ | ✓ | ✓ | ✓ | ✓ |
| Danish | da | ✓ | ✓ | ✓ | ✓ | ✓ |
| Dutch | nl | ✓ | ✓ | ✓ | ✓ | ✓ |
| English | en | ✓ | ✓ | ✓ | ✓ | ✓ |
| Finnish | fi | ✓ | ✓ | ✓ | ✓ | ✓ |
| French | fr | ✓ | ✓ | ✓ | ✓ | ✓ |
| German | de | ✓ | ✓ | ✓ | ✓ | ✓ |
| Greek | el | ✓ | ✓ | ✓ | ✓ | |
| Hebrew | he | ✓ | ✓ | ✓ | ✓ | |
| Hindi | hi | ✓ | ✓ | ✓ | ✓ | ✓ |
| Italian | it | ✓ | ✓ | ✓ | ✓ | ✓ |
| Japanese | ja | ✓ | ✓ | ✓ | ✓ | ✓ |
| Korean | ko | ✓ | ✓ | ✓ | ✓ | |
| Norwegian Bokmål | nb | ✓ | ✓ | ✓ | ✓ | ✓ |
| Norwegian Nynorsk | nn | ✓ | ✓ | ✓ | ✓ | ✓ |
| Polish | pl | ✓ | ✓ | ✓ | ✓ | |
| Portuguese | pt | ✓ | ✓ | ✓ | ✓ | ✓ |
| Romanian | ro | ✓ | ✓ | ✓ | ✓ | ✓ |
| Russian | ru | ✓ | ✓ | ✓ | ✓ | ✓ |
| Serbian | sr | ✓ | ✓ | ✓ | ✓ | ✓ |
| Slovak | sk | ✓ | ✓ | ✓ | ✓ | ✓ |
| Spanish | es | ✓ | ✓ | ✓ | ✓ | ✓ |
| Swedish | sv | ✓ | ✓ | ✓ | ✓ | ✓ |
| Turkish | tr | ✓ | ✓ | ✓ | ✓ |
Additionally, the following languages are supported for language identification.
| Language name | Locale code | Language identification |
|---|---|---|
| Albanian | sq | ✓ |
| Armenian | hy | ✓ |
| Azerbaijani | az | ✓ |
| Bangla | bn | ✓ |
| Bashkir | ba | ✓ |
| Basque | eu | ✓ |
| Belarusian | be | ✓ |
| Bulgarian | bg | ✓ |
| Chuvash | cv | ✓ |
| Esperanto | eo | ✓ |
| Estonian | et | ✓ |
| Georgian | ka | ✓ |
| Gujarati | gu | ✓ |
| Haitian Creole | ht | ✓ |
| Hungarian | hu | ✓ |
| Icelandic | is | ✓ |
| Irish | ga | ✓ |
| Kazakh | kk | ✓ |
| Khmer | km | ✓ |
| Kurdish | ku | ✓ |
| Kyrgyz | ky | ✓ |
| Latvian | lv | ✓ |
| Lithuanian | lt | ✓ |
| Malay | ms | ✓ |
| Malayalam | ml | ✓ |
| Maltese | mt | ✓ |
| Mongolian | mn | ✓ |
| Pashto | ps | ✓ |
| Persian | fa | ✓ |
| Punjabi | pa | ✓ |
| Slovenian | sl | ✓ |
| Somali | so | ✓ |
| Tamil | ta | ✓ |
| Telugu | te | ✓ |
| Thai | th | ✓ |
| Ukrainian | uk | ✓ |
| Urdu | ur | ✓ |
| Vietnamese | vi | ✓ |
Language Dialects, Writing Systems
Arabic
Watson NLP supports Standard Arabic (SA) used across the Middle East, and North Africa. It is reported that it is less accurate for Dialectal Arabic (DA) (e.g. Egyptian Arabic).
Chinese
Watson NLP supports Simplified Chinese (zh_CN) used in Mainland China, and Traditional Chinese (zh_TW) used in Taiwan.
It is not tested for Cantonese used in Hong Kong. Note the following points for Cantonese:
-
Vocabulary: Uses the same grammar but different vocabularies with some overlapping. For written Cantonese, it could be covered to some extent but may not be complete.
-
Character set: Watson NLP supports Unicode, which includes Cantonese characters. But it was not standardized well before 2004 (e.g. GCCS, HKSCS-1999, HKSCS-2001).. Some old systems may hit this issue and it may be incompatible with the system based on the latest Unicode including Watson NLP.
-
Lemma: Watson NLP normalizes the lemma of Chinese words to Simplified Chinese always.
Portuguese
Watson NLP supports both European Portuguese (pt_PT) used in Portugal, and Brazilian Portuguese (pt_BR) used in Brazil. There are some differences in orthography but not significant. Watson NLP supports Portuguese (pt) using combined dictionaries and models.
Serbian
Serbian language has 2 writing systems, Cyrillic script and Latin script. There is a direct transliteration between the two. Watson NLP supports both scripts.
Bosnian and Croatian
Bosnian and Croatian belong to the same language family and it is not easy to distinguish them well in written form. Currently Watson NLP language detection module outputs hr for Bosnian language.