Supported languages

Pretrained models

Language name	Language code	Task types supported
All languages		Detag ¹, Lang-Detect ², Syntax (Izumo) ³
Arabic	ar	Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Relations (Transformer), Sentiment (CNN, BERT, Transformer), Target-Mentions
Chinese (Simplified)	zh-cn	Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (CNN, BERT, Transformer), Target-Mentions
Chinese (Traditional)	zh-tw	Entity-Mentions (RBR), Keywords, Noun-Phrases, Target-Mentions
Czech	cs	Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer)
Danish	da	Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer)
Dutch	nl	Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (CNN, BERT, Transformer), Target-Mentions
German	de	Categories, Concepts, Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Relations (Transformer), Sentiment (CNN, BERT, Transformer), Target-Mentions
English	en	Categories, Concepts, Emotion, Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Relations (SIRE, Transformer), Sentiment (CNN, BERT, Transformer), Target-Mentions, Tone
Finnish	fi	Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer)
French	fr	Concepts, Emotion, Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Relations (Transformer), Sentiment (CNN, BERT, Transformer), Target-Mentions, Tone
Hebrew	he	Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer)
Hindi	hi	Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer)
Italian	it	Concepts, Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Relations (Transformer), Sentiment (CNN, BERT, Transformer), Target-Mentions
Japanese	ja	Concepts, Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Relations (Transformer), Sentiment (CNN, BERT, Transformer), Target-Mentions
Korean	ko	Concepts, Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Relations (Transformer), Sentiment (CNN, BERT, Transformer), Target-Mentions
Norwegian Bokmal	nb	Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer)
Norwegian Nynorsk	nn	Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer)
Portuguese	pt	Concepts, Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Relations (Transformer), Sentiment (CNN, BERT, Transformer), Target-Mentions
Polish	pl	Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer)
Romanian	ro	Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer)
Russian	ru	Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer)
Slovak	sk	Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer)
Spanish	es	Concepts, Entity-Mentions (RBR, BiLSTM, BERT, Transformer), Keywords, Noun-Phrases, Relations (Transformer), Sentiment (CNN, BERT, Transformer), Target-Mentions
Swedish	sv	Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer)
Turkish	tr	Entity-Mentions (RBR, BERT, Transformer), Keywords, Noun-Phrases, Sentiment (BERT, Transformer)

¹ The Detag task is language agnostic.

² Lang-Detect is supported for the languages described in List of Supported Languages below.

³ Syntax support for different parsers (sentence detection, tokenization, lemmatization, parts-of-speech and dependency parsing) is described in List of Supported Languages below.

NLP tasks

List of Supported Languages

Watson NLP supports 31 languages for core functions. Based on a study of content languages for worldwide websites, this can cover 92.6% of them.

Supported languages

Language name	Locale code	Language identification	Sentence segmentation	Tokenization	PoS tagging	Dependency parsing
Afrikaans	af	✓	✓	✓	✓	✓
Arabic	ar	✓	✓	✓	✓	✓
Bosnian	bs	✓*	✓	✓	✓	✓
Catalan	ca	✓	✓	✓	✓
Chinese (Simplified)	zh_CN	✓	✓	✓	✓
Chinese (Traditional)	zh_TW	✓	✓	✓	✓
Croatian	hr	✓	✓	✓	✓	✓
Czech	cs	✓	✓	✓	✓	✓
Danish	da	✓	✓	✓	✓	✓
Dutch	nl	✓	✓	✓	✓	✓
English	en	✓	✓	✓	✓	✓
Finnish	fi	✓	✓	✓	✓	✓
French	fr	✓	✓	✓	✓	✓
German	de	✓	✓	✓	✓	✓
Greek	el	✓	✓	✓	✓
Hebrew	he	✓	✓	✓	✓
Hindi	hi	✓	✓	✓	✓	✓
Italian	it	✓	✓	✓	✓	✓
Japanese	ja	✓	✓	✓	✓	✓
Korean	ko	✓	✓	✓	✓
Norwegian Bokmål	nb	✓	✓	✓	✓	✓
Norwegian Nynorsk	nn	✓	✓	✓	✓	✓
Polish	pl	✓	✓	✓	✓
Portuguese	pt	✓	✓	✓	✓	✓
Romanian	ro	✓	✓	✓	✓	✓
Russian	ru	✓	✓	✓	✓	✓
Serbian	sr	✓	✓	✓	✓	✓
Slovak	sk	✓	✓	✓	✓	✓
Spanish	es	✓	✓	✓	✓	✓
Swedish	sv	✓	✓	✓	✓	✓
Turkish	tr	✓	✓	✓	✓

Additionally, the following languages are supported for language identification.

Language name	Locale code	Language identification
Albanian	sq	✓
Armenian	hy	✓
Azerbaijani	az	✓
Bangla	bn	✓
Bashkir	ba	✓
Basque	eu	✓
Belarusian	be	✓
Bulgarian	bg	✓
Chuvash	cv	✓
Esperanto	eo	✓
Estonian	et	✓
Georgian	ka	✓
Gujarati	gu	✓
Haitian Creole	ht	✓
Hungarian	hu	✓
Icelandic	is	✓
Irish	ga	✓
Kazakh	kk	✓
Khmer	km	✓
Kurdish	ku	✓
Kyrgyz	ky	✓
Latvian	lv	✓
Lithuanian	lt	✓
Malay	ms	✓
Malayalam	ml	✓
Maltese	mt	✓
Mongolian	mn	✓
Pashto	ps	✓
Persian	fa	✓
Punjabi	pa	✓
Slovenian	sl	✓
Somali	so	✓
Tamil	ta	✓
Telugu	te	✓
Thai	th	✓
Ukrainian	uk	✓
Urdu	ur	✓
Vietnamese	vi	✓

Language Dialects, Writing Systems

Arabic

Watson NLP supports Standard Arabic (SA) used across the Middle East, and North Africa. It is reported that it is less accurate for Dialectal Arabic (DA) (e.g. Egyptian Arabic).

Chinese

Watson NLP supports Simplified Chinese (zh_CN) used in Mainland China, and Traditional Chinese (zh_TW) used in Taiwan.

It is not tested for Cantonese used in Hong Kong. Note the following points for Cantonese:

Vocabulary: Uses the same grammar but different vocabularies with some overlapping. For written Cantonese, it could be covered to some extent but may not be complete.
Character set: Watson NLP supports Unicode, which includes Cantonese characters. But it was not standardized well before 2004 (e.g. GCCS, HKSCS-1999, HKSCS-2001).. Some old systems may hit this issue and it may be incompatible with the system based on the latest Unicode including Watson NLP.
Lemma: Watson NLP normalizes the lemma of Chinese words to Simplified Chinese always.

Portuguese

Watson NLP supports both European Portuguese (pt_PT) used in Portugal, and Brazilian Portuguese (pt_BR) used in Brazil. There are some differences in orthography but not significant. Watson NLP supports Portuguese (pt) using combined dictionaries and models.

Serbian

Serbian language has 2 writing systems, Cyrillic script and Latin script. There is a direct transliteration between the two. Watson NLP supports both scripts.

Bosnian and Croatian

Bosnian and Croatian belong to the same language family and it is not easy to distinguish them well in written form. Currently Watson NLP language detection module outputs hr for Bosnian language.