IBM Support

Additional configuration step is required for Content Search Services/ECM Text Search to use morphological dictionaries for Chinese, Japanese and Korean languages for indexing/search.

Troubleshooting


Problem

An extra configuration step is required for Content Search Services (CSS)/ECM Text Search (ECMTS) to use morphological dictionaries for Chinese, Japanese, or Korean (CJK) languages for indexing/search. Without completing this additional step document text is indexed incorrectly causing content searches to fail. This only applies to CSS/ECMTS servers with -forceNgramForCJK=false configured to utilize morphological dictionaries, if the default Ngram processing is used (-forceNgramForCJK=true) this configuration step is not required.
 
Documents indexed without completing this additional configuration step require re-indexing after the additional configuration step modification has been completed.

Cause

The LanguageWare 8.5.2 version used with CSS/ECMTS loads the default break rules dictionary defined in the resource\uima\langware.xml file for CJK languages, whereas earlier versions of LanguageWare ignored this default definition. This only applies to CSS/ECMTS servers with the non-default -forceNgramForCJK=false option configured.

Environment

This issue is platform independent and required for CSS/ECMTS versions 5.2.1.7-P8CSS-IF002 and later 5.2.1x releases, 5.5.0.0-P8CSS-IF003 and later 5.5.0x releases, 5.5.1.0-P8CSS-IF002 and later 5.5.x versions, and future releases. But only if configured with -forceNgramForCJK=false to use the morphological CJK dictionaries.

Diagnosing The Problem

Determine if morphological dictionaries are in use for CJK languages by checking the current configuration setting for the forceNgramForCJK option.  From the server install location\bin directory execute:
    configTool list -system -forceNgramForCJK
 
To show the current forceNgramForCJK setting, for example:
 
Parameter name: forceNgramForCJK
Type: Boolean
Current value: true
Default value: true
Modifiable: true
Modifiable when server is running: true
Scope: all
Subsystem: index
Description: Specifies the language processing mode for documents and queues in CJK languages. The default is true. When true, ngram processing is used. When false, the morphological dictionaries for these languages are used.
 
If the Current value is set to true (as in the above example) Ngram processing is in use for CJK languages and no further configuration is required. However, if Current value is set to false follow the steps outlined below in Resolving the Problem.

Resolving The Problem

For CSS/ECMTS systems configured with -forceNgramForCJK=false to use CJK morphological dictionaries. Follow these steps.
 
1. Shut down the CSS/ECMTS server.
2. Save a copy of the server install location\resource\uima\langware.xml file (for future reference only).
3. Edit the server install location\resource\uima\langware.xml file and delete the following section of XML:
 
      <settingsForGroup name="default">
        <nameValuePair>
          <name>LexicalDicts</name>
          <value>
            <string>ECMTSBreakRules.dic</string>
            <array/>
          </value>
        </nameValuePair>
      </settingsForGroup>
 
4. Save the langware.xml file and restart the CSS/ECMTS server.
5. Re-index any documents added prior to making this change.
 
Note: This is a temporary fix required until a permanent resolution is implemented.
 

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSC2W3B","label":"Content Engine"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

Product Synonym

ECMTS CSS

Document Information

Modified date:
08 June 2021

UID

ibm16456995