About this task
After you create a search collection, you must next identify the source of the online
information that you are going to crawl and index. The examples in this tutorial use a
directory of sample files that are exported by a web server, as explained in Files Used in This Tutorial. You can also crawl this directory as URLs, but this tutorial uses
Files to replicate the process you may use immediate in your company's installation
of Watson Explorer Engine.
Tip: If you are working on a system where
Watson Explorer Engine has not been installed and want to crawl the
sample files as URLs, you would click
Add a new seed and select
URLs from the list that displays and click
Add. Next, specify the
URL of the web server and sample
files directory, and click
OK to save that value. The URL for the
sample
example-metadata collection that is shipped with
Watson Explorer Engine is:
http://IP-ADDRESS/vivisimo/examples/metadata-example,
where
IP-ADDRESS is the IP address or host name of the system on which
you installed
Watson Explorer Engine. Refer to
Documents vs. URLs for
general information on creating a search collection for a web-based files or URLs search
collection.
To crawl sample files that are located on a file server
Results
To define how to extract the contents of the sample files in a usable fashion, proceed to
the next section, Extracting Metadata.
To proceed to the next section of this tutorial, click Extracting Metadata.