The linguistic analysis functions that are provided with Watson Content Analytics include document language detection and segmentation.
When a document is processed, parsing and tokenization functions determine the language of that document and breaks up the stream of input text into distinct units or tokens.
During a search, the user or search application must specify the query language. The query string is segmented and analyzed, and then the index is searched.
Linguistic processing involves lexical analysis, which is the process of creating alternative representations of the input text that associates all available dictionary data to the tokens that are recognized in the input text. Search quality is greatly enhanced by using advanced language processing.