Adding Auto-Classification to a Search Collection Process Overview

A basic outline of the process of creating and automatically classifying a collection is the following:

  1. Create a display for your search application, and:
    • From the Components tab:
      • Add the Express Tagging display component to the display.
      • Add the Auto Classify Link for Administrators display component to the display.
    • Select the Settings tab and click on the Collaboration sections Annotations sub-section in the left-hand menu of the Display pane. If the Annotations Enabled? variable is not already set to true, click Edit, set that variable to true, and click OK to save the updated value.
    • Click add new beside the Groups: Tags label and define a tag annotation to store the auto-classification information. After optionally modifying any other fields for this annotation, click OK to save your changes.
  2. Create a search collection that crawls and indexes the data that you want to auto-classify, and make the following change to that collection:
    • Use the Configuration tab's Binning sub-tab to add a binning set for your tag annotation. Next, add a Binning Tree child to that binning set. Do not change the default separator for the binning tree.
  3. Make the following changes to the source that was automatically created for your search collection (that is, the source with the same name as your collection):
    • On the source's Form tab, edit the VSE Source Form component, and add the following line to the Sort text box after switching that text box to XML mode by clicking on the [xml] link to the right of the text box:
      random|$score * (1 - math:random() + <value-of select="math:random()" /> div 100)
  4. Create a Watson Explorer Engine project for your search application, and make the following changes:
    • Edit the Display field to list the name of the display that you created earlier in this process.
    • Edit the Sources (default) field to list the name of the source that is associated with the search collection that you are auto-classifying.
    • Edit the Sources (advanced) field to list the name of the source that is associated with the search collection that you are auto-classifying.
  5. From your search collection's Overview tab, click start to begin crawling and indexing your search collection
  6. When crawling and indexing complete, click the link with the name of your project in Watson Explorer Engine's left-hand navigation menu. This displays your search application in a new window or tab. Submit an empty query by pressing return.
  7. In the search results window, click on the header link Auto-classify collection-name to display the Auto Classification screen. This screen contains multiple sections. The header for the current stage of the auto-classification process is always highlighted. On this screen, do the following:
    • In the Classification Name and Sources section, enter or modify the name of the auto-classification hierarchy that you want to create, which is known as the Current Classification. The default name for your first classification run is the same as the name of your search collection or the Display name that you set in the Meta Information for the source for that collection.

      If more than one source is involved in your project, make sure that all of the sources that you want to auto-classify are checked. Only sources associated with search collections that have been crawled by the Watson Explorer Engine search engine can be auto-classified. Meta-Search sources cannot be auto-classified.

      Click Save, then click return to config screen to save the name of your auto-classification run and the sources that it will classify together. The Create Classes heading is highlighted.

    • In the Create Classes section, do one of the following:
      • To auto-classify the search results in your collection, leave the Create from Data Sampling radio button selected, and optionally modify any of the default values. Click Start Sampling and then See the progress here..
      • If you are importing a taxonomy, select the Import a Taxonomy radio button, enter the URL of the file containing the taxonomy that you want to use in the Taxonomy Location field, and select the format of that taxonomy from the Taxonomy Format drop-down list.
    • The Sample Results page displays after clicking see the progress here.
  8. Verify and edit the suggested classification. When you are done, click the Configuration link in the page header to return to the Auto Classification screen.
  9. Modify the settings in the Express Tagging section of this page to identify the project that contains the display in which you defined the tag annotation and its associated delimiter. Click the Start Express Tagging button to begin tagging the documents in your collection with the clusters that have been created.
  10. You can then view and refer to your auto-classified hierarchy when doing any subsequent searches within your search application.

To begin the auto-classification tutorial, click About This Tutorial.