Defining Resources to Crawl and Index

About this task

After you create a search collection, you must next identify the source of the online information that you are going to crawl and index. The examples in this tutorial use a directory of sample files that are exported by a web server, as explained in Files Used in This Tutorial. You can also crawl this directory as URLs, but this tutorial uses Files to replicate the process you may use immediate in your company's installation of Watson Explorer Engine.

Tip: If you are working on a system where Watson Explorer Engine has not been installed and want to crawl the sample files as URLs, you would click Add a new seed and select URLs from the list that displays and click Add. Next, specify the URL of the web server and sample files directory, and click OK to save that value. The URL for the sample example-metadata collection that is shipped with Watson Explorer Engine is: http://IP-ADDRESS/vivisimo/examples/metadata-example, where IP-ADDRESS is the IP address or host name of the system on which you installed Watson Explorer Engine. Refer to Documents vs. URLs for general information on creating a search collection for a web-based files or URLs search collection.

To crawl sample files that are located on a file server

Procedure

  1. Click Add a new seed
  2. Select Files from the list that displays
  3. Click Add as shown in the screen in Figure 2.
  4. Click Add.

    This displays a screen like the one shown in Figure 1.

    Figure 1. Defining the Seed for a File Based Search Collection
  5. Enter the full path-name to the sample files directory discussed in Files Used in This Tutorial.

    For example, if your Watson Explorer Engine software is installed on the machine on which you are working, you could enter a full path-name like the following:

     /opt/IBM/WEX/Engine/examples/data/metadata-example/

    Because there is a trailing slash in the Seed Files entry, this restricts the crawl to the directory that contains the example files in the metadata-example directory (and any subdirectories that this directory may contain).

  6. To save your changes, click OK.

Results

To define how to extract the contents of the sample files in a usable fashion, proceed to the next section, Extracting Metadata.

To proceed to the next section of this tutorial, click Extracting Metadata.