Sample plug-ins for custom global analysis

The sample plug-ins for custom global analysis show how you can use custom logic in addition to the default global analysis tasks that occur during the indexing process.

The followings samples are provided in the ES_INSTALL_ROOT/samples/jaql directory:

Prerequisite: Before you can build the samples, you must install and configure Apache ANT, a Java based build tool. For information about how to install and configure Apache ANT, see http://ant.apache.org/.

To use the sample plug-ins for custom global analysis:

  1. Compile the custom global analysis archive files. From the command line, change to the ES_INSTALL_ROOT/samples/jaql directory and enter the following command to run Apache Ant on the provided build.xml file.

    ant -f build.xml

    If you receive a ClassNotFoundException error, update the following line in the build.xml file to specify the absolute file path to the jaql.jar file. The jaql.jar file is installed by IBM® InfoSphere® BigInsights in the $JAQL_HOME directory.

    <property name="path.jaql" value="/opt/ibm/biginsights/jaql/jaql.jar" >

  2. In the administration console, create a collection and select the Use IBM InfoSphere BigInsights option.
    • For the Simple.zip sample, create an enterprise search collection
    • For the javaudf.zip sample, create a content analytics collection.
  3. Create the search fields that are used by the samples:
    • For the Simple.zip sample, create a field with the name rank and select the Returnable, Faceted search, and Fielded search attributes.
    • For the javaudf.zip sample, create a field with the name tfidf and select the Returnable and Faceted search attributes. Because the value generated by this sample is a string, ensure that the Parametric search attribute is not selected
  4. Configure the custom global analysis task. In the Parse and Index pane of the administration console, click Configure > Global processing > Custom global analysis and click the Add icon.
    1. On the Custom Global Analysis Fields and Custom Global Analysis Facets pages, select the fields and facets to pass to the script for analysis.
      • For the Simple.zip sample, select the date field. You do not need to select any facets.
      • For the javaudf.zip sample, select the Part of Speech ($._word) facet. You do not need to select any fields.
    2. On the Custom Global Analysis Archive File page, specify the path to the sample archive file on your local computer.
  5. Restart the parse and index services for the collection. For the javaudf.zip sample, you must also deploy the analytic resources. In the Parse and Index pane, click Analytic Resources and click the icon to start the resource deployment task.
  6. Configure a crawler for the collection and build the index.
  7. After the documents are indexed, you can view the results of the custom global analysis processing.
    • For the Simple.zip sample, open the enterprise search application and search for documents. Each document now has a custom_rank field and a rank facet.
    • For the javaudf.zip sample, open the content analytics miner, and explore documents. Each document now has a custom_tfidf field and tfidf facet. However, the value is not added if the TF-IDF value does not exceed the threshold, as specified in the $ES_INSTALL_ROOT/samples/jaql/javaudf/modules/tfidf.jaql file.