Finding objects with content-based retrieval
Include a full-text search expression as part of a query to search for objects based on
text in object content or properties. Define stop words and synonyms to help ensure that a full-text
search expression returns the expected results.
Object searching process overview
The process of searching for object text begins with the submission of a CBR query and ends with the return of the search results. A CBR query includes a CONTAINS function call to perform a full-text search.
CBR query syntax introduction
A CBR query includes a CONTAINS function call to perform a full-text search.
CBR query result ranking
When used in the ORDER BY clause of a CBR query, the Rank property permits you to return objects in order of search relevance. Based on factors such as term instance frequency, IBM® Content Search Services calculates the value of the Rank property for each returned object.
Full-text search types
The phrase search expression refers to the second argument that you pass to the CONTAINS function call in CBR query. The search expression indicates the type of full-text search to be performed.
Token searches: Language-aware versus exact-match
For indexing purposes, IBM Content Search Services parses document text into tokens. A token can be generally conceived as either a word or character sequence.
Effects of text language misidentification
IBM Content Search Services employs language-aware processing for both documents and search expressions. This processing requires that the text language be properly identified. For a search expression, language misidentification is impossible because a search expression is run for all indexed languages. For a document, language misidentification causes incomplete indexing, which potentially makes the document not findable by some types of searches.
Text normalization
During searching and indexing, some characters such as accented characters and umlauts are replaced with the equivalent characters from the Latin alphabet. This type of replacement is called normalization; it means that some words, such as Müller and Mueller, are treated as identical words. A search for either returns all instances of both.
Submitting a CBR query
Submit a CBR query from the administration console to search for object text within a single object store.
Optimizing CBR query performance
You can optimize the performance of a CBR query by controlling the order of the two main constituent searches for the query. These two searches are the database search and the full-text search.
Setting stop words for full-text searches
Define words or phrases that frequently occur in object text as stop words to help increase the relevance of full-text search results. An IBM Content Search Services search server ignores any stop words that are included in a full-text search expression.
Setting synonyms for full-text searches
Define synonyms to avoid specifying all of the possible alternative names for a term in a full-text search expression. For consistent search results, set the same set of synonyms for each IBM Content Search Services server.