You can use the Retrieve and Rank Web Interface to create a cluster and collection, upload documents and questions, and train and test rankers. Your feedback about the tool helps shape future versions of the tool and other Watson tooling.
Prerequisites for using the Retrieve and Rank Web Interface include:
A set of documents in any combination of HTML, PDF, or Microsoft Word formats. When users of your application ask questions, these are the documents from which the Retrieve and Rank service searches and ranks answers.
Note: The Retrieve and Rank Web Interface uses the Document Conversion service to convert input files. See the Document Conversion documentation for information about customizing how the Retrieve and Rank Web Interface converts different types of input files into well-chunked answers.
A list of questions that pertain to the corpus and reflect the types of questions your users will ask. The list must be in text format, one question per line. In general, the more questions, the better the quality of the trained ranker, but a minimum of 50 questions is required to get started.
If you plan to use the Retrieve and Rank Web Interface with the Cranfield data collection provided as an example for the Retrieve and Rank service, configuration changes to your Solr cluster with the
The following steps walk you through getting started with Retrieve and Rank in a typical use case for the Retrieve and Rank Web Interface.
The tool prompts you to select a Document Conversion service instance to convert your source documents for the Retrieve and Rank service. Select a Document Conversion service instance from the drop-down list and click
Connect to service. See the Document Conversion service documentation for information about that service.
The tool prompts you to select a Retrieve and Rank service instance. Select one from the drop-down list and and click
Connect to service.
If your Retrieve and Rank service instance does not already have a cluster associated with it, the tool prompts you to create a cluster by supplying a name and a size. Provide a name for the cluster and select the cluster size from the drop-down menu. If you are unsure what cluster size to use, click the help link below the drop-down menu or see Sizing your Retrieve and Rank cluster.
Note: Cluster pricing is based on the size of the cluster. See your Bluemix dashboard for details.
When you have finished selecting your cluster options, click
Create. The tool starts creating the cluster as a background task.
After you have created a cluster, or if you already had a cluster in your Retrieve and Rank service instance, the tool displays the Clusters page. You can select an existing cluster (including the one you created in the previous step) or create a new cluster.
If you select an existing cluster that already has a collection associated with it, you can select that collection for use with the tool. Otherwise, when the cluster has been created and initialized, you can create a new collection. For the purposes of this exercise we assume that you want to create a new collection. To do so, click
Create a new collection.
The tool prompts you for a collection name. Enter a name and click
Create. The tool creates the collection.
When the collection is ready, click
→ (the right-pointing arrow button) to enter the collection.
Return to Tasksto see what the next task is. For the purposes of this exercise we assume that the next task is uploading documents to your new collection.
Note: The tool is task-based, providing you with tasks that will most efficiently advance the capabilities of the trained ranker. If you are ever uncertain what to do next, click the Tasks tab and perform one or more of the tasks listed there.
The tool displays an Upload documents panel. Click
Import, then import the documents you want to upload to your collection by dragging or clicking documents into the Import documents dialog box. Documents can be in any combination of HTML, PDF, and Microsoft Word formats.
The dialog box includes a checkbox for
Split my documents up into individual answers for me; it defaults to selected. For the purposes of this exercise we recommend retaining the default by leaving it selected. When this option is enabled, the Document Conversion service splits up the documents into individual answers by the Document Conversion service before passing them to the Retrieve and Rank service for storage and indexing.
The Import documents dialog box shows you the progress of the upload. During the upload process, the tool uses the Document Conversion service to convert your source files to the JSON format used by the Retrieve and Rank service.
As the tool uploads the selected documents, it displays their names on the page behind the dialog box. When the upload process has finished uploading the selecting documents, the Import documents dialog box tells you how many documents it uploaded. You can either upload more documents, up to the limits specified in Sizing your Retrieve and Rank cluster, or click
Finish. For the purposes of this exercise, click
Importin the dialog box. The tool opens the Add questions dialog box. The tool takes questions in the form of a text file with one question per line.
Startin the Active Learning dialog box to start rating the first 50 questions selected by the tool.
Submit Ratings. Alternatively, you can click
Do not include this question in Watson's training,
I can't rate these, or
Add another answerfor a particular question. These alternatives are shown and discussed in the tutorial.
Return to Tasksto continue with the next step. Additional tasks typically include rating new sets of questions, which the tool categorizes as a background task. The tool also enables you to view answers you have already provided to various questions and to review the documents in the collection.
When you have answered enough questions to satisfy the tool's training requirements, the Tasks tab displays a Train a ranker dialog box. Click
Train in the dialog box to begin training a ranker with the information you have provided.
Note: Each ranker has an associated price, as does training a ranker. See your Bluemix dashboard for details.
The tool displays the progress of the ranker's training. Training can take anywhere from a few minutes to over an hour, depending on the size and complexity of the training set. You can either wait for the ranker to finish being trained or perform other tasks on the Tasks tab.
When the ranker's training is complete, a new task asks you to review the results. Click on the Performance tab. The tool shows you the accuracy of the new ranker compared to the accuracy of a base Retrieve (Solr) search and any previous rankers you have trained. Click on the ranker's bar chart to see more details about it.
If you want to test your new ranker, click the Content tab, then click
Try Out Watson to ask questions and receive answers from different rankers and the untrained Retrieve service.
It is exceptionally rare for the first trained ranker to be accurate enough to become the production ranker. To improve future versions of the ranker, return to the Tasks tab to answer more questions. You can also add more questions to the question list and more documents to the collection.
Iterate over these steps until you have a ranker with the accuracy your application needs. You can then continue to query the ranker by using either the tool or a Retrieve and Rank application.
Note: The Retrieve and Rank Web Interface uses a
30. An application that uses the same ranker needs to use the same setting to ensure the consistency of ranked results. See Preparing training data for information about the
Delete ranker. Even after the tool deletes a ranker, it retains a record of the ranker's efficiency so you can continue to compare new versions against previous versions.