Editing Classes Manually

Classes can either be entirely manually created, derived from your search collection based on sampling, or an imported taxonomy. Whichever method is used to initially create classes, the results should be reviewed for appropriateness and manually adjusted as necessary.

When the screen listing the initial classes that are derived from your search collection first displays, it may look like the screen in Figure 1, especially if you are sampling a search collection to automatically create your classes. Both the sampling process and importing a large taxonomy can take a significant amount of time. The Root Node is an internal node that is used as the top of a class hierarchy - it is created automatically and is not stored or used after the classification process is complete. The Not Classified cluster is also created automatically, providing a container for documents that do not fit into any other cluster. (The percentage of documents that are located in the Not Classified cluster can help you determine how well the clusters that you are creating cover your data.) When manually creating classes, the Root Node is displayed and classes can be added from there, like the screen in Figure 1.

Figure 1. Initial Auto-Classification Results
Tip: If you aren't sure that auto-classification is still in progress, you can monitor the progress of auto-classification by examining the status of the auto-classify search collection that is used internally by the sampling and taxonomy application processes. You can examine the state of this collection by opening the auto-classify collection in the Watson™ Explorer Engine administration tool and monitoring its Overview tab. This screen is automatically updated every 5 seconds.

When the number of pending and unprocessed URLs in the Crawling section is 0, the sampling or taxonomy import process is done and will go into an idle state. When the number of uncommitted URLs in the Indexing section is 0, indexing is complete, and the indexer will also go into an idle state. At this point, the class creation process is complete, and you can return to the Sample Results page.

If you are using an automatic class creation method, once the class creation process completes, the screen will look something like Figure 2.

Figure 2. Completed Auto-Classification Results, First Pass

As Figure 2 shows, the Root Node and any generated classes are accompanied by a menu to the right of the name of the class. This menu provides access to the set of possible operations that you can perform on that class. The list of all available operations are the following:

  • add - Adds a new class below the currently selected class. Displays a dialog that enables you to specify the name of the new class and the query that should be used to identify documents that belong to the new class.
    Note: Whenever you add classes below a selected class, either by using the add or auto-classify commands, the icon to the left of the class name changes to a plus sign to notify you that classes are located within that class. You can click the plus sign to show the sub-classes, at which point it is shown as a minus sign. Clicking the minus sign collapses the expanded class.
  • auto-classify - Generates sub-classes below the associated class by sampling the documents that are currently in that class. (You cannot generate sub-classes for the root node or any other node that already contains sub-classes.)
  • redefine - Displays a dialog that enables you to change the query associated with that class. (You cannot redefine the query for the root node because it is associated with a NULL query.)
  • remove - Enables you to delete a class. (The root node cannot be deleted.)
  • rename - Displays a dialog that enables you to change the name of a class. (The name of the root node cannot be changed.)
  • tag - Begins the tagging process for the current class and any subclasses that it contains. Tagging saves the current class name and associated query as a tag on each document.
    Note: The tag option is only available for the root node when other classes exist.

Selecting any cluster in the classification displayed on the left side of this window shows 10 sample documents from that cluster. After selecting a cluster, the total number of documents shown in the upper right-hand corner of the display gives the number of documents that are present in that cluster.

When sampling results, you will typically want to remove classes, rename classes, expand the query used to identify the documents in a class, manually create certain classes, create sub-classes by auto-classifying existing classes, and so on. Once you are done, you can click the Configuration link in the Auto-classification screen's information header in order to return to the Auto-classification configuration screen.

When one or more classes exist in the sample results page of the Auto Classification tool, you can proceed to Step 3: Express Tagging.