A stopword is a word that you want excluded from your index
and, as a consequence, from your searches. A typical stopword list
includes words like of, the, and by. Stopword
lists depend on the content and type of your data.
Any frequently occurring word that you want excluded from your
index is a candidate for inclusion in a stopword list. Stopword lists
can reduce the time it takes to perform a search, reduce index size,
and help you avoid false hits.
To create and drop stopword lists, you use procedures defined for
the DataBlade® module. For example, to create
a stopword list, first create an operating system file that contains
the list of stopwords, one word per line. Then make the stopword list
known to the
IBM® Informix® Excalibur Text Search DataBlade module
by executing the procedure
etx_CreateStopWlst(),
as shown in the following example:
EXECUTE PROCEDURE etx_CreateStopWlst
('stopwlist', '/local0/excal/stopwlist');
This statement creates the stopword list stopwlist from
the operating system file /local0/excal/stopwlist.
An optional third argument can be used to specify the sbspace where
the list is to be stored. If you do not specify a specific sbspace
to store the list, it is stored in the default sbspace. The default
sbspace is specified by the SBSPACENAME parameter in the onconfig file.
You can create your own stopword list file, or you can create one
based on a list of standard English-language stopwords provided with
your DataBlade module in the following location:
$INFORMIXDIR/extend/ETX.version/wordlist/etx_stopwords.txt
where
version is
the current version of the DataBlade module installed
on your computer.
You can have at most one stopword list associated with an etx index.
The stopword list is specified when the index is initially created
with the index parameter STOPWORD_LIST. The stopword list must exist
when the etx index is created.
At times, you might want to include words in a search that currently
exist in your stopword list. For example, suppose that the following
words exist in your stopword list:
to,
or, and
be.
Suppose further that you want to search for the exact phrase “to be
or not to be.” To occasionally search for stopwords with an
etx index
that has a stopword list associated with it, specify the INCLUDE_STOPWORDS
index parameter when you create the index. Then use the CONSIDER_STOPWORDS
tuning parameter when you execute the search. The CONSIDER_STOPWORDS
parameter forces the search engine to include words that you previously
stipulated as stopwords. For example, you can search for the phrase
to
be or not to be as follows:
SELECT id, description FROM videos
WHERE etx_contains(description,
Row('to be or not to be',
'SEARCH_TYPE = PHRASE_EXACT & CONSIDER_STOPWORDS'));
Important: The CONSIDER_STOPWORDS tuning parameter
of the etx_contains() operator works only if the INCLUDE_STOPWORDS='TRUE' index
parameter is specified for the CREATE INDEX statement that creates
the etx index.