The sample plug-ins for custom global analysis show how
you can use custom logic in addition to the default global analysis
tasks that occur during the indexing process.
The followings samples are provided in the ES_INSTALL_ROOT/samples/jaql directory:
- The simple.zip sample is an example of a
plug-in that uses only the built-in functions of Jaql. As specified
in the install.json configuration file, this
plug-in runs the ../modules/dateRanking.jaql script.
This script sorts and assigns a rank to all documents according to
the value of the date field. The script then adds the rank for each
document to the rank field and facet in the index.
- The javaudf.zip sample is an example of a
plug-in that uses Java user-defined functions (UDF) and multiple Jaql
module scripts. As specified in the install.json configuration
file, this plug-in runs the ../modules/tfidfMain.jaql script.
This script computes the TF-IDF (term frequency–inverse document frequency)
weight for all nouns in the entire document set, and then adds the
TF-IDF values for each document to the tfidf field and facet in the
index.
Prerequisite: Before you can
build the samples, you must install and configure Apache ANT, a Java
based build tool. For information about how to install and configure
Apache ANT, see
http://ant.apache.org/.
To use
the sample plug-ins for custom global analysis:
- Compile the custom global analysis archive files. From the command
line, change to the ES_INSTALL_ROOT/samples/jaql directory
and enter the following command to run Apache Ant on the provided build.xml file.
ant
-f build.xml
If you receive a ClassNotFoundException
error, update the following line in the build.xml file
to specify the absolute file path to the jaql.jar file.
The jaql.jar file is installed by IBM® InfoSphere® BigInsights in the $JAQL_HOME directory.
<property
name="path.jaql" value="/opt/ibm/biginsights/jaql/jaql.jar" >
- In the administration console, create a collection and select
the Use IBM InfoSphere BigInsights option.
- For the Simple.zip sample, create an enterprise
search collection
- For the javaudf.zip sample, create a content
analytics collection.
- Create the search fields that are used by the samples:
- For the Simple.zip sample, create a field
with the name rank and select the Returnable, Faceted
search, and Fielded search attributes.
- For the javaudf.zip sample, create a field
with the name tfidf and select the Returnable and Faceted
search attributes. Because the value generated by this
sample is a string, ensure that the Parametric search attribute
is not selected
- Configure the custom global analysis task. In the Parse and Index
pane of the administration console, click Configure > Global
processing > Custom global analysis and
click the Add icon.
- On the Custom Global Analysis Fields and Custom
Global Analysis Facets pages, select the fields and facets
to pass to the script for analysis.
- For the Simple.zip sample, select the date
field. You do not need to select any facets.
- For the javaudf.zip sample, select the Part
of Speech ($._word) facet. You do not need to select any fields.
- On the Custom Global Analysis Archive File page,
specify the path to the sample archive file on your local computer.
- Restart the parse and index services for the collection. For the javaudf.zip sample,
you must also deploy the analytic resources. In the Parse and Index
pane, click Analytic Resources and click the icon to start the
resource deployment task.
- Configure a crawler for the collection and build the index.
- After the documents are indexed, you can view the results of the
custom global analysis processing.
- For the Simple.zip sample, open the enterprise
search application and search for documents. Each document now has
a custom_rank field and a rank facet.
- For the javaudf.zip sample, open the content
analytics miner, and explore documents. Each document now has a custom_tfidf
field and tfidf facet. However, the value is not added if the TF-IDF
value does not exceed the threshold, as specified in the $ES_INSTALL_ROOT/samples/jaql/javaudf/modules/tfidf.jaql file.