If the text of a document is in Japanese, Watson Content Analytics performs relevant word segmentation by using morphological analysis technology that is optimized for the Japanese language.
An enhanced linguistic analysis engine called JJSA is used to analyze Japanese documents in content analytics collections. JJSA provides dependency information between words so that users can create rules to match against relations of words. JJSA ignores sentences that contain only ASCII characters. JJSA also ignore sentences that are longer than or equal to 50 characters because long English sentences have a negative effect in Japanese linguistic analysis.
The system also supports typical Okurigana variants, which are Kanji word endings that are written in Hiragana.