Crawling Application and Filesystem Data

To set up a Search Collection encompassing some set of data locations that you wish your users to be able to access, click the Add icon next to the Search Collection item on the left hand menu bar. Name this search collection getting-started-sc as seen in the screen show in Figure 1 and click the Add button to create it.

Figure 1. Creating A Search Collection
Note: Remember, the Search Collection is a grouping of indexable locations, or seeds. Now that you have the collection, you must populate it with the seeds. Click on the Add New Seed button, and a list of available seed types appears. Naturally, a seed can be a traditional URL, supporting most common network locations, such as websites, file shares, SMB shares, databases, email archives, and so on. Watson™ Explorer Engine also has a rich plugin architecture that enables your search application to connect to other URL types, such as Oracle UCM, SharePoint, Siebel, and so on.

Click the tab labeled Configuration to add a new seed. In the Seeds section, click Add a new seed. A new window opens that lists the types of seeds that are available in your Watson Explorer Engine installation, as shown in Figure 2. Each seed is useful for certain protocols, some of which may require extra connector plugins to communicate using proprietary protocols.

Select the seed that you want to use, and click Add to accept that choice. In this case, choose URLs.

Figure 2. Adding a Seed to Watson Explorer Engine

Selecting a seed and clicking Add displays a screen in the Watson Explorer Engine Admin Tool that enables you to provide additional information about your new seed. In this case, enter the URL of the site you wish the search engine to index in the Seed URLs field. For this example, use a sample website such as, which is the home page for the Internet Assigned Numbers Authority.

When crawling a site, Watson Explorer Engine will follow links throughout that site by default, including any references to other sites that the sites that you are crawling might contain. To prevent Watson Explorer Engine from accidentally crawling the entire Internet, enter 5 into the Maximum number of hops field. Hops defines how far the crawler will go away from the main URL given; in this case, only five levels of links and references would be followed by the crawler. You may set this number to other values, depending on how the information is structured on the site that you are crawling.

Click the OK button to continue.

You have now created a search collection. If you were to integrate the search collection into a functioning Watson Explorer Engine search application, the site you have identified as the seed would be used as the basis for all search results given back to the user.

To see a search in action, you may start the search engine so that it crawls the specified URL and creates an index for your search collection. To do this, click the Overview tab at the top of the Search Collection screen. Click the start button to the right of the Live Status line. This starts the search engine, which first crawls the URL(s) you've defined as the seeds within your search collection, and creates an index it can use to provide quick results to your users.

To test your search, type a word you'd like to search for in the search field (not the Quick Jump field) in the left hand menu bar. This will take you to a Watson Explorer Engine search results page.

Note: The following figure includes an example of the kind of results you should expect to see. However, your exact results will vary from this example, depending upon the specific sources used.
Figure 3. Watson Explorer Engine Results