Configuration of multi-language settings

The following table lists the languages that are supported by the IBM® StoredIQ®.

This list does not apply to IBM StoredIQ Cognitive Data Assessment. With CDA, the only supported language is English.

Table 1. Supported languages
Language Code Lemmas Stop words
Arabic ar X  
Catalan ca    
Chinese zh X  
Czech cs X  
Danish da X  
Dutch nl X  
English en X X
Finnish fi X  
French fr X X
German de X X
Greek el X  
Hebrew he X  
Hungarian hu    
Icelandic is    
Italian it X  
Japanese ja X  
Korean ko X  
Malay ms    
Norwegian (Bokmal) nb X  
Norwegian (Nynorsk) nn X  
Polish pl X  
Portuguese pt X X
Romanian ro    
Russian ru X  
Spanish es X X
Swedish sv X  
Thai th X  
Turkish tr X  
Vietnamese vi    

By default, English is the only language that Multi-language Support identifies during a harvest and it is also the default search language. Both the identified language (or languages) and the search default language can be changed in the siq-findex.properties file on the data server. You can find this properties file in the /usr/local/tomcat/webapps/storediq/WEB-INF/classes directory on each data server. All versions of the siq-findex.properties file must be kept in sync across all data servers for searches to be consistent and correct.

To change the language that the harvester can identify, use the index.presetLanguageIDs field, which is the second-to-last line of the file: index.presetLanguageIDs = en,fr,de,pt. The first language in the list is the default language, which is assigned to a document whose language cannot be identified.

To change the default search language, use the search.defaultLanguage field, which is the last line of the file. The search language is used to determine which language's rules, that is, stop words, lemmas, character normalization, apply in a search. Only one language can be set as the default for search. However, the default language can be manually overwritten in a full-text search: lang:de[umkämpft großteils]

After you change this property file, you must restart the data server and reharvest the volumes that are to be searched. If the data server is the DataServer - Distributed type, run the following command on the data server after restarting it:

/etc/deepfile/dataserver/es-update-findex-props.py