Crawling and Indexing Your Search Collection
About this task
The Watson™ Explorer Engine crawls and indexes the documents in a search collection in order to be able to quickly and flexibly search that data. When you perform a search against the search collection, Watson Explorer Engine uses its index to identify matching documents, and then returns the title and relevant portion of the document for each result. You can then click the title to go to the original document wherever the original data is located.
To begin crawling and indexing the email data that is contained in the Enron email archive that we installed in About This Tutorial
Procedure
Results
Crawling and indexing begin!
You can monitor the progress of the crawling and indexing process from the screen where you began the indexing process - the page is automatically updated every 5 seconds. When the number of pending and unprocessed URLs in the Crawling section is 0, the crawler is done and will quit. When the number of uncommitted URLs in the Indexing section is 0, indexing is complete, and the indexer will also go into an idle state.
Once crawling and indexing complete, we are almost ready to classify the data in our collection. The remaining steps are to create a project that unites all of the components that we have created into a single Watson Explorer Engine application, and to do the classification itself.
To proceed to the next section, click Creating Your Project.