Data Sampling

About this task

Auto-classifying the documents in your search collection by sampling those documents and deriving classes from those samples is the default auto-classification method. To use this method

Procedure

  1. Leave the Create from Data Sampling radio button selected
  2. Optionally modify any of the following fields:
    • Number of Levels - The number of levels of the classification/clustering tree to compute during the initial sampling run. Increasing this value will increase the amount of time required to create the initial set of clusters.
    • Number of Classes - Classes are created by taking random samples of the data, getting clusters from those samples, and using some of the clusters as classes. This happens in accordance with a few different settings. This slider varies the settings in order to make it more likely that there will be a few or many classes.

      For the first time sampling a particular data set, try sampling with the slider in the middle and based on the classes you get, try sampling again with the slider varied according to what you would like in relation to that baseline.

    After selecting a classification method and supplying the values that it requires

  3. Click Start Sampling if you are using the Create from Data Sampling method
  4. Or click Import if you are importing an existing taxonomy.

    The dialog shown in Figure 1 displays.

    Figure 1. Dialog Shown After Creating Classes
  5. Click the See the progress here link to proceed to the next step of this tutorial

    Where you can view and edit the classes that have been generated for the documents in your collection.

Results

To proceed to the next section, click Editing Classes Manually.