Skip to main content

The Full-Text Search feature in IBM Rational ClearQuest, Version 7.1: Part 4. Customize to use different languages

Conclusion of a four-part series on implementing this feature

George Aroush (aroush@us.ibm.com), Advisory Software Engineer, IBM
author photo
George Aroush has been with IBM's Rational team for 5 years. He has lead and worked on several projects focusing on ClearQuest. Recently, he has lead, architected, and implemented the Full-Text Search feature for ClearQuest V7.1. Prior to IBM, Mr. Aroush spent 12 years working in the information retrieval, knowledge management, and data mining field. He was responsible for the design and implementation of several search engines and high-performance solutions still in use today. During his free time, Mr. Aroush is an active open source contributor. He leads the Apache Lucene.Net project, which he ported from Java to C#. Mr. Aroush holds a Masters degree in Computer Science from Northeastern University and a BFA from Tufts University.

Summary:  This article guides IBM Rational ClearQuest administrators in customizing languages that can be used in the Full-Text Search feature introduced in Version 7.1. This is the last of a four-part series about getting started with this feature.

View more content in this series

Date:  24 Sep 2009
Level:  Intermediate PDF:  A4 and Letter (23KB | 7 pages)Get Adobe® Reader®
Activity:  1513 views
Comments:  

Note:
This article is based on using ClearQuest Version 7.1.0. Future versions may directly refer to this article, although the technology is subject to change.

To configure ClearQuest Full-Text Search to work with your native language ClearQuest database, you need to configure the Solr schema.xml file to use the proper analyzer for your language. This is done by telling Solr which analyzer to use. The only XML block that you need to change in the Solr schema.xml file is the one inside of this entry:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">

For details about configuration for different languages and the changing the content of the schema.xml file for language support and other settings, see the Solr documentation at in addition to Lucene documentation. You can also get details about how to set up different languages to work with Solr.

By default, through SnowballPorterFilterFactory, Solr has analyzers to support the following languages:

  • CJK (Chinese, Japanese, Korean)
  • Danish
  • Dutch
  • English
  • Finnish
  • French
  • German and German2
  • Italian
  • Kp
  • Lovins
  • Norwegian
  • Porter
  • Portuguese
  • Russian
  • Spanish
  • Swedish

In addition to Snowball analyzers that Solr supports, the following analyzers supported by Lucene are also available:

  • BrazilianAnalyzer
  • ChineseAnalyzer
  • CJKAnalyzer
  • CzechAnalyzer
  • DutchAnalyzer
  • FrenchAnalyzer
  • GermanAnalyzer
  • GreekAnalyzer
  • RussianAnalyzer
  • ThaiAnalyzer.

The analyzers in Snowball, are more configurable, then the default analyzers in Lucene and offer a stemmer. This can result with better indexing and search quality. If you can choose between two analyzers, use the one from the Snowball.

These instructions cover the configuration for the following analyzers: CJK, Chinese, Spanish, and French.

CJK language support

To enable ClearQuest Full-Text Search on a CJK language support:

  1. Follow the “Step-by-step Full-Text Search configuration” section in Part 3 of this series. During the step in which you edit the schema.xml file to replace the XML <fields> and <copyField> blocks, look for <fieldType name="text" class="solr.TextField" positionIncrementGap="100">, and replace it with the code in Listing 1.

Listing 1. Code to use CJK as the search language
<fieldType name="text" class="solr.TextField">
    <analyzer type="index" class="org.apache.lucene.analysis.cjk.CJKAnalyzer” />
    <analyzer type="query" class="org.apache.lucene.analysis.cjk.CJKAnalyzer” />
</fieldType>

  1. Continue with the configuration and setup process.

Chinese language support

To enable Chinese language support:

  1. Follow the “Step-by-step Full-Text Search configuration” section in Part 3 of this series. During the step in which you edit the schema.xml file to replace the XML <fields> and <copyField> blocks, look for <fieldType name="text" class="solr.TextField" positionIncrementGap="100">, and replace it with the code in Listing 2.

Listing 2. Code to use Chinese
<fieldType name="text" class="solr.TextField">
    <analyzer type="index" class="org.apache.lucene.analysis.cn.ChineseAnalyzer” />
    <analyzer type="query" class="org.apache.lucene.analysis.cn.ChineseAnalyzer” />
</fieldType>

  1. Continue with the configuration and setup process.

Spanish language support

To enable Spanish language support:

  1. Follow the “Step-by-step Full-Text Search configuration” section in Part 3 of this series. During the step in which you edit the schema.xml file to replace the XML <fields> and <copyField> blocks, look for <fieldType name="text" class="solr.TextField" positionIncrementGap="100">, and replace it with the code in Listing 3.

Listing 3. Code to use Spanish
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.StandardFilterFactory"/> 
    <filter class="solr.ISOLatin1AccentFilterFactory"/> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.SnowballPorterFilterFactory" language="Spanish" /> 
    </analyzer>
    <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.StandardFilterFactory"/> 
    <filter class="solr.ISOLatin1AccentFilterFactory"/> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.SnowballPorterFilterFactory" language="Spanish" /> 
    </analyzer>
</fieldType>

  1. Continue with the configuration and setup process.

Note:
The Spanish analyzer has more flexibility in terms of specifying additional filters and tokenization. Also, notice that we are using the Spanish analyzer offered by the Snowball analyzer.

French language support

To enable French language support:

  1. Follow the “Step-by-step Full-Text Search configuration” section in Part 3 of this series. During the step in which you edit the schema.xml file to replace the XML <fields> and <copyField> blocks, look for <fieldType name="text" class="solr.TextField" positionIncrementGap="100">, and replace it with the code in Listing 4.

Listing 4. Code to use French
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.StandardFilterFactory"/> 
    <filter class="solr.ISOLatin1AccentFilterFactory"/> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.SnowballPorterFilterFactory" language="Spanish" /> 
    </analyzer>
    <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.StandardFilterFactory"/> 
    <filter class="solr.ISOLatin1AccentFilterFactory"/> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.SnowballPorterFilterFactory" language="French" /> 
    </analyzer>
</fieldType>

  1. Continue with the configuration and setup process.

Note:
The French analyzer has more flexibility in terms of specific additional filters and tokenization. Also, notice that we are using the French analyzer offered by the Snowball analyzer.


Acknowledgement

Special thanks to David Sampson, a staff technical support engineer in IBM Rational Client Support, who serves on the ClearQuest Full-Text Search Cross-Functional Team.


Resources

Learn

Get products and technologies

Discuss

About the author

author photo

George Aroush has been with IBM's Rational team for 5 years. He has lead and worked on several projects focusing on ClearQuest. Recently, he has lead, architected, and implemented the Full-Text Search feature for ClearQuest V7.1. Prior to IBM, Mr. Aroush spent 12 years working in the information retrieval, knowledge management, and data mining field. He was responsible for the design and implementation of several search engines and high-performance solutions still in use today. During his free time, Mr. Aroush is an active open source contributor. He leads the Apache Lucene.Net project, which he ported from Java to C#. Mr. Aroush holds a Masters degree in Computer Science from Northeastern University and a BFA from Tufts University.

Comments



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Rational
ArticleID=430051
ArticleTitle=The Full-Text Search feature in IBM Rational ClearQuest, Version 7.1: Part 4. Customize to use different languages
publish-date=09242009
author1-email=aroush@us.ibm.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers