The Full-Text Search feature in IBM Rational ClearQuest, Version 7.1: Part 1. Search overview and use cases

ClearQuest Full-Text Search is one of the new features in IBM® Rational® ClearQuest® Version 7.1. With this tool, a ClearQuest Web user can search any ClearQuest record the same way as searching the Web. This article helps ClearQuest administrators understand the new feature and its setup, configuration, and underlying architecture.

Share:

George Aroush (aroush@us.ibm.com), Advisory Software Engineer, IBM

author photoGeorge Aroush has been with IBM's Rational team for 5 years. He has lead and worked on several projects focusing on ClearQuest. Recently, he has lead, architected, and implemented the Full-Text Search feature for ClearQuest V7.1. Prior to IBM, Mr. Aroush spent 12 years working in the information retrieval, knowledge management, and data mining field. He was responsible for the design and implementation of several search engines and high-performance solutions still in use today. During his free time, Mr. Aroush is an active open source contributor. He leads the Apache Lucene.Net project, which he ported from Java to C#. Mr. Aroush holds a Masters degree in Computer Science from Northeastern University and a BFA from Tufts University.



03 September 2009 (First published 03 September 2009)

Also available in Spanish

Introduction

One of the new features in IBM® Rational® ClearQuest® Version 7.1 is the Full-Text Search feature. With this feature, a ClearQuest Web user can search any ClearQuest record, just like searching the Web.

This article describes this new technology and the engine behind it. The author assumes that you are familiar with ClearQuest and have read the existing documentation for the ClearQuest V7.1 Full-Text Search feature in the IBM Rational ClearQuest V7.1 Information Center. This article covers ClearQuest Full-Text Search as of V7.1.0.


Quick overview of the Full-Text Search feature

In this section, you will learn about the basic functions of the Full-Text Search feature. This section is also a quick overview of search engines in general. It gives high-level details of what makes up a search engine and how it works.

Full-text search in a nutshell

At a high level, for any full-text search solution to work, two key components are required: an indexer component and a searcher component. The commonality between them is the full-text search index files, which are used by both index and search components.

A third, lesser-known component is the analyzer component. An analyzer takes a stream of raw text and breaks it into tokens (words) that the indexer uses to add to the full-text search index. The searcher also depends on the analyzer to separate a user's search string into tokens for searching.

A fourth, less common component is the administration component, which is used to administrate the full-text search offering. Some search engines offer it, while others do not.

Indexer component

The indexer component of a full-text search solution adds new records to the search engine index file, so the component must have write access. This component is also referred to as the writer. To prevent data corruption, best practices dictate that there is only one writer adding records to an index. However, there are techniques to enable multiple writers if necessary (for high performance). If there are multiple writers, they must be synchronized. Although such a setup is supported by Solr, it is beyond the scope of ClearQuest.

Searcher component

After an indexer creates an index, the index can be searched. A searcher, also referred to as a reader, is used to search the index. There can be multiple readers on an index at any given time. In addition, those readers can continue reading the index while a writer is adding new records to the index. Those added records are not visible to the other readers until the readers re-open the index. Finally, a reader can search one or multiple indexes at the same time. This enables scalability by distributing the index across multiple drives, file systems, or servers. (See Solr’s online documentation for details.)

Analyzer component

An analyzer is a lesser-known component of a search engine, but an important one. The analyzer is used by the reader and the writer. An analyzer takes a stream of raw text and breaks it into words. This process depends on how well-tuned an analyzer is for that language. For example, an English analyzer will not only break a stream of text into words, but may also apply stemming rules on the words. By doing so, search quality will improve considerably. For example, searching for the word spell will also find words such as "spelling", "spelled", "spellers", "spelled", "spells", and so on.

Administrator component

Most search engines have an administration component. The key job of the administration component is to provide the means to configure the indexing and searching component. In addition, the administration component usually provides capabilities such as tuning and monitoring of both searching and indexing.


Lucene in a nutshell

Lucene is a free, open-source information retrieval library, originally implemented in Java™ by Doug Cutting and released on SourceForge.net on March 2000 as Version 0.01. Soon after that, it was incubated into Apache Software Foundation and Version 1.2 was released using the Apache Software License. Over the years, Lucene has built a respected community and followers, and it has been ported to several programming languages:

  • Delphi
  • Perl
  • C#
  • C++
  • Python
  • Ruby
  • PHP

In some cases, the port is a full API and class port (such as for C#), while in other cases it is just byte recompilation or a port of the reader only, so that the language can provide read access to the Lucene index.

Although suitable for any application that requires full-text indexing and searching capability, Lucene has been widely recognized for its utility in the implementation of Internet search engines and local, single-site searching. Lucene itself is simply an indexer and search library. It does not contain a crawler and HTML parsing functionality, or any other code to extract raw text from standard document formats such as Microsoft® Word documents or Adobe® PDF files.

At the core of the Lucene logical architecture is the notion of a document containing fields of text. This flexibility allows the Lucene API to be generic, and thus neutral of any file or data format. Text from a PDF, HTML, or Microsoft Word file, a database source, and many others can all be indexed if their textual information can be extracted.

Lucene documents

To index data in Lucene, a Lucene document must be created. A Lucene document contains a set of Lucene fields. Each field has a name and a textual value. A field can be stored with the document, in which case it is returned with search hits on the document. Thus each document should typically contain one or more stored fields that uniquely identify it.

From the perspective of ClearQuest, a ClearQuest record translates to a Lucene document. The fields in a ClearQuest record are mapped into fields of a Lucene document. However, unlike ClearQuest records, in Lucene, there is no specific record type, as a document type. In addition, in Lucene, a document may contain one or more sets of fields; that is, a Lucene document is not a direct mapping to a ClearQuest record type.

Lucene fields

A Lucene field is a section of a Lucene document. Each field has two parts: a name and a value.

There is no limit to the number of fields in a Lucene document, and Lucene has no requirement that every document have the same set of fields. A document may have a field that another document doesn't. In addition, Lucene is case-sensitive in regards to field names but, by default, it is not case sensitive on text that it is searching.

  • Values can be free text, which can be processed for indexing (through analyses and tokenization) or stored as is without any processing. If they are stored as-is, they are treated as atomic keywords. Such keywords may be used to represent dates, URLs, unique IDs, and so on.
  • Fields are optionally stored in the index so that they can be returned with hits on the document.

There is no limit to the number of fields in a Lucene document, and Lucene has no requirement that every document have the same set of fields. A document may have a field that another document doesn't. In addition, Lucene is case-sensitive in regard to field names but, by default, it is not case-sensitive on text that it is searching.

From the perspective of ClearQuest, each ClearQuest field (of any record type) can map to a Lucene field. The ClearQuest DBID field will be stored in the Lucene index (for a Lucene document), which is considered an atomic keyword. It is through this field that Lucene hits are mapped back to the records in the ClearQuest database.

Lucene index

At the heart of Lucene is an index. Any time a new Lucene document is created and indexed, it is added to the Lucene index. A Lucene index is a proprietary file format, but open and documented. The index file consists of anywhere from three files to as many as 100 files (when the index is not optimized).

Lucene analyzer

The value of a field, in a Lucene document, must be analyzed for it to be indexed. Lucene has the concept of a pluggable analyzer. This is an important part of Lucene because, with this architecture, Lucene can use different analyzers for indexing and searching. An analyzer examines raw-text content and provides tokens for use by the indexer. The raw-text stream can be tokenized in many unique ways. A trivial analyzer can tokenize streams at white space; a different one can perform filtering of tokens, based on the application needs; other analyzers use stemming.

Because an analyzer is pluggable, different analyzers can be used for different languages. It is through the analyzer (and Unicode support of Java) that Lucene offers ready-to-use support for different languages such as: Brazilian, Simplified Chinese, CJK, Czech, Dutch, French, German, Greek, Russian, and Thai. For additional languages not directly supported by Lucene, you can either write an analyzer or see if someone already has written one. There are also commercial analyzers for Lucene for a variety of languages and tuning.


Solr in a nutshell

Solr was originally developed by CNET Networks. It was donated to the Apache Software Foundation in January 2006, and since then it has continued to grow both in terms of new development and its user base. The current version of Solr is 1.2, which was released in June 2007 and is used by ClearQuest V7.1.0.

Solr is an open-source enterprise search server based on the Lucene search library. It includes these functions (although not all are implemented in ClearQuest 7.1):

  • XML, HTTP, and JavaScript Object Notation (JSON) APIs
  • Hit highlighting
  • Faceted search
  • Caching
  • Replication

Web administration interface

Solr runs in a Java servlet container, such as IBM® WebSphere®, Tomcat, and so on. Solr has these advantages:

  • Advanced full-text search capabilities
  • Standards-based open interface for XML and HTTP
  • Scalability
  • Efficient replication to other Solr search servers
  • Flexibility and adaptability with XML configurations
  • Extensibility through a plug-in architecture

In effect, Solr uses the Lucene search library and extends it.

Solr schema

Solr uses a schema to define Lucene documents and fields. In doing so, Solr enables dynamic fields (new fields that can be added without requiring the re-indexing of data), provides a configurable Lucene analyzer, combines two or more fields into one, allows the elimination of types or fields, and provides an external file base configuration for stop-words, synonyms, and a protected words list. In addition, it provides language configuration through its pluggable analyzer.

Solr query

The Solr query engine is an HTTP interface with configurable response formats (XML, XSLT, JSON, Python and Ruby). The engine provides result sorting on one or more fields, highlighting capability, faceted searching, scoring configuration, and performance optimization.

Solr replication

Solr replication provides efficient distribution of index parts that have changed through rsync transport (this is not implemented in ClearQuest 7.1). You can easily add searchers by using the provided Pull strategy. This capability is highly desirable when you use Solr in a high-performance search environment, where hundreds of searches are being issued in a minute and responses must be made within fractions of a second.

Solr Admin interface

Solr has a Web-based administrative interface that provides statistics on cache utilization, updates, and queries. The interface also provides text analyses, debugging, and logging, and has a query interface to the Lucene search.


Overview of the ClearQuest Full-Text Search feature

ClearQuest already has search capabilities: Find Record and SQL Query. Those search capabilities are different from the Full-Text Search feature, which has a different audience, goal, and end result.

The existing ClearQuest Find Record feature is limited to finding ClearQuest records by their display name or database ID. Full-Text Search is not intended to replace Find Record, but rather to complement it. If a ClearQuest database has a record with a display name such as SAMPL00000041, you can still use Full-Text Search to find this record. However, you might get more than one record as a hit, and the SAMPL00000041 record might not be ranked first. This is because the SAMPL00000041 text might also exist in another record.

Full-Text Search compared to the Query feature

The ClearQuest Query feature is limited to finding ClearQuest records based on an SQL select statement. The process of finding records can sometimes be complex and time-consuming, and it requires several steps to build an SQL statement. In addition, the returned records are not ranked by relevance. Furthermore, depending on the query, it can take considerable time and tax the database server to run the query.

Full-Text Search compared to SQL

The Full-Text Search syntax is different from an SQL query. This is important to keep in mind to understand and use the new search feature of ClearQuest V7.1.

The SQL select statement (or even a database vender's full-text search syntax and result set) is different from the Full-Text Search feature. Because the Full-Text Search feature sends search requests to Solr and, in turn, Solr sends them to Lucene, it gives you access to the power and capability of the Lucene search syntax.

If you want to find only terms or phrases, you can use the Full-Text Search feature in the same way that you use a Web browser to do a Web search, or you can take advantage of additional search capabilities to better define your search term. If you use the Full-Text Search feature, you need to have a basic knowledge of your ClearQuest schema and the Lucene search syntax.


Lucene search syntax

Lucene has a rich search syntax that is simple yet powerful. It uses industry-standard, Web-based search syntax, and as a result if you are familiar with Web searching you should be familiar with it. Those same search syntaxes are used by the Full-Text Search feature.

Case insensitive

Lucene is case-insensitive for search terms or phrases. That is, searches for the terms clearquest, ClearQuest, CLEARQUEST and so on are all the same. However, it can be made case-sensitive if you prefer (see Solr's schema.xml – discussed later in this article – on how to do so. However, doing so will not only reduce the quality of the search, but will also increase Lucene's index size. This practice is strongly discouraged). However, as you will see later, Lucene is case-sensitive when keywords are used. Lucene is also case-sensitive on field names. In the context of ClearQuest Full-Text Search, field names refers to ClearQuest display field names.

Terms

In Lucene, a query is broken up into terms and operators. There are two types of terms: single terms and phrases.

A single term is a single word, such as spell or login. A phrase is a group of words surrounded by double quotation marks, such as "spell error". Multiple terms can be combined together with Boolean operators to form a more complex Lucene query (those are described in the following sections).

To search for a set of words, or phrase, enter them in the ClearQuest Web text search field. To search for a phrase, use quotation marks around the set of words. A term search looks like spell or clearquest clearcase. If you are searching for both ClearQuest and IBM® Rational® ClearCase®, a phrase search might look like this: clearquest clearcase

Note:

When you search for a phrase, such as "clearcase clearquest" or "clearquest clearcase," the words' order is considered different searches. You will get hits only when the words exist in the order given in your phrase search.

Stopwords

Lucene can be configured to treat certain words as stopwords. This is a common practice to eliminate noise from the search, and to improve the quality of hits. For example, terms such as "a", "an", "is", "it", "the", and so on, are consider stopwords. At indexing time, they are eliminated from the Lucene index. You can configure the set of stopwords to suit your needs by editing the Solr schema.xml file.

What is important about stopwords is to recognized that if "a" is set as a stopword, which is the case for the default English configuration of the ClearQuest Full-Text Search feature, then a search for "a" will not return any hits. In addition, if a document contains the text "ClearQuest is an award winning application" then the search phrase for "clearquest is an award wining application" or "clearquest award wining application" (note how "is an" is eliminated from the second example) will result with the same hits. This is because "is" and "an" are listed as stopwords, which get eliminated at indexing time.

Fields

As mentioned earlier, Lucene supports field data. When doing a search, you can either specify a ClearQuest display field to search within or use the default (when no field is specified). When you use the default, the search is performed across all fields. To search within a specific field, use the colon (:) separator between the field name and the search term. For example, if you want to search for spelling only within the Headline field of any record, then use the syntax Headline:spelling. To search for a phrase within a field, use the syntax Headline:"spelling error" (with the quotation marks).

Note:

A field-restricted search, such as Headline:spelling error (without quotation marks) results in the term spelling being searched within the Headline field, and the result of this search will be combined with the result of an AND search for errors across all fields (with V7.1 and 7.1.0.1, OR was the default, instead). Thus, if you want to narrow your search so that the term spelling and error are within the Headline field, you must use the following syntax: Headline:spelling Headline:error

As stated earlier, Lucene is case-sensitive for field names. For example, headline:spelling is not the same as Headline:spelling (the emphasis here is on the capitalization of the field name: headline or Headline).

Term modifiers

Lucene supports modifying query terms to provide a wide range of search options. When you use these options, they can help broaden or narrow your search syntax to get a better hit result.

Wildcard searches

Lucene supports single and multiple-character wildcard searches. Use a question mark (?) for single-character wildcard search and an asterisk (*) for multiple characters.

The single-character wildcard search looks for terms that match the single character replaced. For example, to search for text or test you can use this search:
te?t

Multiple character wildcard searches look for 0 or more characters. For example, to search for test, tests, or tester, you can use this search:
test*

You can also use the wildcard searches in the middle of a term.

Note

You cannot use either of these wildcard characters as the first character of any search term.


Lucene supports fuzzy searches based on the Levenshtein Distance (also referred to as the Edit Distance algorithm.) To do a fuzzy search use the tilde (~) symbol at the end of a single-word term. For example, to search for a term similar in spelling to word, use word~. This search term will match wood, work, dword, wordy, ford, worf, warning, and so on, in addition to word.

Lucene supports finding words that are within a specific distance. To do a proximity search, use the tilde symbol at the end of a phrase. For example, to search for clearquest and clearcase within 10 words of each other in a record, use this search: "clearquest clearcase" ~10

With range searches, you can match records with field values that are between the lower and upper boundaries specified by the search range. Range searches can be inclusive or exclusive of the upper and lower bounds. Inclusive range searches are denoted by square brackets: []. Exclusive range searches are denoted by curly braces: {}.

The search term SubmitDate:[2007 TO 2008] will find records where SubmitDate fields have values between 2007 and 2008, inclusive. You can narrow it to a specific month to, for example, find records that are submitted in the month of October, 2008:
SubmitDate:[20081001 TO 20081031]

Notice that range searches are not reserved for date fields. You can also use range searches with no date fields. For example, a search for FruitName:{apple TO blueberry} would result in records where FruitName = banana.

Boosting a term

Lucene provides the relevance level of matching records based on the terms found. To boost a term use the caret (^) symbol with a boost factor (a number) at the end of the term you are searching on. The higher the boost factor, the more relevant the term will be.

Boosting allows you to control the relevance of a record by boosting its term. For example, if you are searching for clearquest clearcase, and you want the term clearquest to be more relevant in the result set scoring, then boost it by using the caret (^) symbol along with the boost factor next to the term. You would type:
clearquest^4 clearcase

This will make records with the term clearquest appear more relevant than those with the term clearcase. You can also boost phrase terms, as shown in this example: "clearquest 7.1"^4 "clearcase7.1"

By default, the boost factor is 1 (one). Although the boost factor must be positive, it can be less than 1 (for instance, 0.2) to lower the relevancy of records that contain that term or phrase.


Boolean operators

Boolean operators allow terms to be combined through logic operators. Lucene supports AND, plus (+), OR, NOT, and minus (-) as Boolean operators. Boolean operators must be ALL CAPS.

OR

The OR operator links two terms or phrases and finds a matching record if either of the terms or phrases exist in a record. This is equivalent to a union that uses sets. You can use the double-bar (double-pipe) symbol (||) in place of the word OR.

To search for documents that contain either clearquest or clearcase or just clearquest, use this query:
clearquest OR clearcase

You can combine multiple OR searches, as in this example:
clearquest OR clearcase OR 7.1
or
clearquest || clearcase || 7.1
Both which have the same meaning.

AND

The AND operator matches documents where both terms exist anywhere in the text of a single record. This is equivalent to an intersection that uses sets. Two ampersands (&&) can be used in place of the word AND. With ClearQuest 7.1.0.2, the AND operator is the default conjunction operator. This means that if there is no Boolean operator between two terms or phrases, the AND operator is used.

To search for records that contain clearquest and clearcase, you can use any of these queries:
clearquest clearcase or clearquest AND clearcase or clearquest && clearcase
The three examples shown above have the same meaning.

Editing the Solr XML schema file

With ClearQuest 7.1.0.2, the default Boolean operator is AND. You can make OR the default instead by changing a setting in the Solr schema.xml configuration file. Look for <solrQueryParser defaultOperator="AND"/> and change the AND to OR. Be careful if you make this change, because it will result in more hits being returned when you use multiple terms or phrases.

+ (plus)

The plus (+) operator (also known as the required operator) requires that the term after the plus symbol exist somewhere in the field of a single record. To search for records that must contain clearquest and 7.1 but can also contain clearcase, use this query:
+clearquest +7.1 OR clearcase

This syntax is different if you add AND:
clearquest AND 7.1 OR clearcase

This is because when you use either the AND or the OR Boolean operator, the order of operator appearance will affect the search. It will be treated as clearquest AND (7.1 OR clearcase) rather than the what you intended:
(clearquest AND 7.1) OR clearcase

NOT

The NOT operator excludes records that contain the term after NOT. This is equivalent to a difference using sets. The exclamation mark symbol (!) can be used in place of the word NOT. To search for records that contain clearquest but not clearcase, use this query: clearquest NOT clearcase

- (minus)

The minus (-) operator (also known as the prohibit operator) excludes records that contain the term directly after the minus symbol. To search for records that contain clearquest but not clearcase, use this query:
clearquest -clearcase


Grouping

Lucene supports using parentheses to group clauses to form sub-searches. This can be useful if you want to control the Boolean logic for a query. To search for either clearquest or clearcase and 7.1 use this query:
"(clearquest OR clearcase) AND 7.1" or "(clearquest OR clearcase) 7.1" (without the quotes).

Lucene also supports using parentheses to group multiple clauses to a single field. To search for a headline that contains both the word clearcase and the phrase clearquest web, use this query:
Headline:(clearcase "clearquest web")


Escaping special characters

Lucene supports escaping special characters that are part of the search syntax. This is the current list of special characters: + - && || ! ( ) { } [ ] ^ " ~ * ? : \

To escape these characters, use the \ (backslash) before the character. For example, to search for (1+1):2, use this search: \(1\+1\)\:2

Or to search for UNCShare:\\myhost\myshare\, use:
UNCShare:\\\\myhost\\myshare\\


ClearQuest database search methods

There are two ways to find records in a ClearQuest database:

  • Use the Query feature
  • Use the Find Record feature

The Query feature

To find data in a ClearQuest database, you can navigate through the ClearQuest public or private workspace, and look for a saved query that matches your search criteria. If there is no such saved query, you need to build one and then run it to retrieve matching records. The search is performed by the database server and might have performance implications, especially in the case of multiline strings, large data sets, or significant filtering (increase in join operations).

After you run the query, you might get one or more records (or "No match" if your query does not find anything). In general, when there are two or more records matching your query, there is no order to the returned result unless you specify a sort order.

There are two ways to build a ClearQuest query:

  • Through the GUI Query builder
  • Through the SQL Query builder (this privilege must be enabled)

When you use the SQL Query builder, you must know the record names, and the database field names (not the display-names), and be well-versed in SQL.

The drawback to using the Query feature is that there are many cases in which you might want to determine if a specific term, terms, or phrases exists anywhere in the ClearQuest database. Building a query is time-consuming, can be complex, and can have database performance implications. In addition, because the Query feature depends on an SQL statement, the effect is that there are limitations bound to SQL itself and the database vendor. In addition, an SQL Query is not ideal for efficient text searching in the same way that full-text search Engines are.

The Find Record feature

The Find Record feature is used when you know either the display-name (fully qualified or just its number), or the database ID (DBID) of the record (not all clients support Find Record by DBID, for instance, ClearQuest on Eclipse). If so, and if you enter a valid record ID, a single record is returned when it is found; otherwise no record is returned.

The drawback to using the Find Record feature is that, unless you know the display name (and so on) of a record that you are looking for, your search will fail.


Full-Text Search options

With the ClearQuest Web V7.1 Full-Text Search feature, two new search capabilities are now available: Basic and Intermediate Full-Text Search.

The Basic Full-Text Search is similar to the existing Find Record feature. You will see a prompt like the one shown in Figure 1.

Basic Full-Text Search query entry field
text box next to Search button

Rather than entering the record display name or DBID (database ID), enter plain text or complex Full-Text Search terms or phrases. If you type ClearQuest, all of the indexed records that contain the word clearquest (in any case combination, for example clearquest, ClearQuest, CLEARQUEST, and so on) and in any field indexed will be returned as a hit. For the search term, spell companies on the ready-to-use ClearQuest SAMPL database, will return the result shown in Figure 2, with columns for Score, Record Type, Record ID, and Record Contents.

Search results for "spell companies" query
results in table format, screen capture

Click to enlarge

If results come from multiple record types, the Record Type column will show to which record type the hit record belongs to. The Record Contents column will be the designated field value for the record type with a hit, which was configured by the ClearQuest Full-Text Search administrator.

For example, if there is a hit in the record type of Customer and its designated display field is Name, then the content of the field Name for the hit record is returned. Any and each record type will have its own designated display field.

Given that you are a power user who understands the syntax of the Full-Text Search feature (the Lucene search syntax as outlined earlier), in addition to understanding the schema of your ClearQuest database, you can type a complex Full-Text Search term to narrow the search. For example, if you type Headline:spelling, the search will find only records with the word spelling in the Headline field of any record type. If you want to further narrow the search to only the record type defect, type:
Headline:spelling record_type:Defect

The Intermediate Full-Text Search expands on the Basic Full-Text Search use case by offering some level of control on which record types to limit a search to. When you select it, you will see a prompt like that shown in Figure 3, which shows Search Scope options to check: Customer, Project, Defect, Email Rule.

Search with Search Scope expanded
Query field with check boxes underneath

In this case, you will still be able to type your search term in the Basic Full-Text Search mode, but now you have the option to select which record types to use to limit your search.


Administration and configuration

To help with Full-Text Search configuration and administration, ClearQuest V7.1 offers tools, such as cqtssetup.pl and cqtsdbcrawler.pl. In addition, Solr includes tools to configure and administer Lucene, such as the admin page and Luke.

Index Administration

For Index Administration, as a ClearQuest administrator, you have the means to configure the following properties:

Index Freshness

This setting tells the record extractor how often to check the ClearQuest database for new or updated records for indexing. Think of this setting as the frequency of IBM® Rational® ClearCase MultiSite® synchronization. For Full-Text Search, the default setting for Update mode Record Extractor is to check for new or updated records every 10 minutes. It may take up to 10 minutes for a change to be reflected in the search result.

Record-Type Index

This setting allows the administrator to configure which record types to index and which not to index.

Record-Field Index

This setting is a subset of a record type index that allows the administrator to configure which fields to index for a Record-Type Index.

Solr and Lucene Index Location

This setting allows the administrator to specify the physical location of where the Solr and the Lucene index should be located on a file system. For optimal performance, it is best to put your index on a nonoperating system disk that runs at 10,000 RPM, with high seek time and low latency.

Search Administration

As a ClearQuest administrator, you can configure the following properties:

Result set Record Contents column

For each record type that is indexed, the ClearQuest administrator designates a field with content that will be used in the Record Contents column when a Full-Text Search returns results, ideally a descriptive field, such as Headline.

Result set size

This is the number of hits to return per page. However, it is overridden by your preference setting for ClearQuest Web.

Cache size

This is the number of hits to cache, which will be used to navigate through the result-set as the user pages through.

The details of how to configure some of these parameters and several others are described in Part 2 of this article. In addition, they are documented in the ClearQuest V7.1 Full-Text Search Information Center.


Initial full-text indexing

If you are deploying ClearQuest V7.1 for the first time, chances are that you do not have pre-existing records that need to be indexed for the Full-Text Search feature. If you are upgrading your ClearQuest deployment to V7.1, then you have records that you will want to index so that users can search those records. This is the most common situation.

Initial indexing: first-time ClearQuest users

In this use-case, you are installing and using ClearQuest for the first time in your organization. In doing so, you have no ClearQuest records. The steps you take to enable the Full-Text Search feature involve two tasks:

  1. Create the ClearQuest schema. Before you consider a full-text search, first create your schema, test it, validate it, and approve it.
  2. Set up the Full-Text Search feature. After you have your ClearQuest schema ready for deployment, you are ready to configure and enable a full-text search against the ClearQuest database.

Indexing: upgrading ClearQuest users

In this use case, you are upgrading your ClearQuest deployment to Version 7.1, and you want to enable full-text searching on your existing ClearQuest records. You will complete these steps to enable the Full-Text Search feature:

  1. Full-Text Search setup: Configure and enable a full-text search against the ClearQuest database.
  2. Full-Text Search index on an existing ClearQuest record: Because you already have existing ClearQuest records, they must be indexed so that they are available for searching.

Next steps

This concludes Part 1 of this four-part series. Part 2 shows you how to install and configure the Full-Text Search feature (see the link to "More in this series").


Acknowledgement

Special thanks to David Sampson, a staff technical support engineer in IBM Rational Client Support who serves on the Rational ClearQuest Cross-Functional Team. This article would not have been complete without David's valuable input and technical review.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Rational software on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Rational
ArticleID=392783
ArticleTitle=The Full-Text Search feature in IBM Rational ClearQuest, Version 7.1: Part 1. Search overview and use cases
publish-date=09032009