Files

This seed can be used to crawl files local to the Watson™ Explorer Engine installation.

The following values can be configured for this type of seed:

  • Files - Newline-separated list of files to crawl. UNIX users can use a path such as
    • /usr/local/
    Windows users can use a path such as
    • C:/Program Files/

This component simply prepends file:/// to the given file path(s) to create a valid URL.

Tip: The filesystems used by Linux, Unix, and Unix-like computer systems can contain special types of files, such as block and character device nodes and files that represent named pipes, which cannot be crawled because they do not contain data, but serve as device or I/O access points. Attempting to crawl such files will generate crawler and converter errors during the crawl. To avoid such errors, you should exclude the /dev directory in any top-level crawl on a Linux, Unix, or Unix-like filesystem. If present on the system thqat you are crawling, you should also exclude temporary system directories such as /proc, /sys, and /tmp that contain transient files and system information.