Using the Retrieve and Rank Web Interface

You can use the Retrieve and Rank Web Interface to create a cluster and collection, upload documents and questions, and train and test rankers. Your feedback about the tool helps shape future versions of the tool and other Watson tooling.

Tool prerequisites

Prerequisites for using the Retrieve and Rank Web Interface include:

  • A Bluemix account.

  • A set of documents in any combination of HTML, PDF, or Microsoft Word formats. When users of your application ask questions, these are the documents from which the Retrieve and Rank service searches and ranks answers.

    Note: The Retrieve and Rank Web Interface uses the Document Conversion service to convert input files. See the Document Conversion documentation for information about customizing how the Retrieve and Rank Web Interface converts different types of input files into well-chunked answers.

  • A list of questions that pertain to the corpus and reflect the types of questions your users will ask. The list must be in text format, one question per line. In general, the more questions, the better the quality of the trained ranker, but a minimum of 50 questions is required to get started.

  • If you plan to use the Retrieve and Rank Web Interface with the Cranfield data collection provided as an example for the Retrieve and Rank service, configuration changes to your Solr cluster with the cranfield-solr-config.zip file.

Stepping through the Retrieve and Rank Web Interface

The following steps walk you through getting started with Retrieve and Rank in a typical use case for the Retrieve and Rank Web Interface.

Logging in and setting up services

  1. Go to https://watson-retrieve-and-rank.ng.bluemix.net/login.

  2. The tool prompts you to select a Document Conversion service instance to convert your source documents for the Retrieve and Rank service. Select a Document Conversion service instance from the drop-down list and click Connect to service. See the Document Conversion service documentation for information about that service.

  3. The tool prompts you to select a Retrieve and Rank service instance. Select one from the drop-down list and and click Connect to service.

  4. If your Retrieve and Rank service instance does not already have a cluster associated with it, the tool prompts you to create a cluster by supplying a name and a size. Provide a name for the cluster and select the cluster size from the drop-down menu. If you are unsure what cluster size to use, click the help link below the drop-down menu or see Sizing your Retrieve and Rank cluster.

    Note: Cluster pricing is based on the size of the cluster. See your Bluemix dashboard for details.

  5. When you have finished selecting your cluster options, click Create. The tool starts creating the cluster as a background task.

  6. After you have created a cluster, or if you already had a cluster in your Retrieve and Rank service instance, the tool displays the Clusters page. You can select an existing cluster (including the one you created in the previous step) or create a new cluster.

  7. If you select an existing cluster that already has a collection associated with it, you can select that collection for use with the tool. Otherwise, when the cluster has been created and initialized, you can create a new collection. For the purposes of this exercise we assume that you want to create a new collection. To do so, click Create a new collection.

  8. The tool prompts you for a collection name. Enter a name and click Create. The tool creates the collection.

  9. When the collection is ready, click (the right-pointing arrow button) to enter the collection.

Taking the tutorial

  1. The tool prompts you to take a short tutorial to familiarize yourself with the use of the tool. If this is your first time using the tool, we strongly recommend that you take the tutorial; it will help you navigate easily through the remaining steps to use the tool and train a ranker quickly and efficiently.
  2. Follow the steps of the tutorial as directed. The tutorial is self-guided and is therefore not detailed in this documentation.
  3. After you complete the tutorial, click Return to Tasks to see what the next task is. For the purposes of this exercise we assume that the next task is uploading documents to your new collection.

Note: The tool is task-based, providing you with tasks that will most efficiently advance the capabilities of the trained ranker. If you are ever uncertain what to do next, click the Tasks tab and perform one or more of the tasks listed there.

Uploading documents

  1. The tool displays an Upload documents panel. Click Import, then import the documents you want to upload to your collection by dragging or clicking documents into the Import documents dialog box. Documents can be in any combination of HTML, PDF, and Microsoft Word formats.

    The dialog box includes a checkbox for Split my documents up into individual answers for me; it defaults to selected. For the purposes of this exercise we recommend retaining the default by leaving it selected. When this option is enabled, the Document Conversion service splits up the documents into individual answers by the Document Conversion service before passing them to the Retrieve and Rank service for storage and indexing.

  2. The Import documents dialog box shows you the progress of the upload. During the upload process, the tool uses the Document Conversion service to convert your source files to the JSON format used by the Retrieve and Rank service.

  3. As the tool uploads the selected documents, it displays their names on the page behind the dialog box. When the upload process has finished uploading the selecting documents, the Import documents dialog box tells you how many documents it uploaded. You can either upload more documents, up to the limits specified in Sizing your Retrieve and Rank cluster, or click Finish. For the purposes of this exercise, click Finish.

Importing questions

  1. The tool displays a badge on the Tasks tab to indicate that you have another task to perform. Click the tab. The tool displays the Upload representative questions dialog box.
  2. Click Import in the dialog box. The tool opens the Add questions dialog box. The tool takes questions in the form of a text file with one question per line.
  3. The tool imports the questions and displays them under the heading All Questions on the Content tab. The tool analyzes the questions against your uploaded documents and starts determining which questions need to be answered first for the best results.
  4. While the tool analyzes and orders the question list, it displays a badge on the Tasks tab. Click the tab. The tool displays the Active Learning dialog box.

Answering questions

  1. Click Start in the Active Learning dialog box to start rating the first 50 questions selected by the tool.
  2. The tool displays the first question followed by four possible answers selected from the uploaded documents. Rate each answer on a scale of one to four stars, as demonstrated in the tutorial. When you have finished, click Submit Ratings. Alternatively, you can click Do not include this question in Watson's training, I can't rate these, or Add another answer for a particular question. These alternatives are shown and discussed in the tutorial.
  3. When you complete rating answers for the first question and submit or otherwise dispose of the question, the tool immediately shows you the next question and answer set. Continue rating answers for each question in turn. The tool displays your progress by percentage on the upper left-hand side of the tool panel.
  4. When you have finished rating answers for the first 50 questions, the tool displays a Task Complete acknowledgment. Click Return to Tasks to continue with the next step. Additional tasks typically include rating new sets of questions, which the tool categorizes as a background task. The tool also enables you to view answers you have already provided to various questions and to review the documents in the collection.
  5. You can review your overall progress by clicking the Performance tab toward the right-hand side of the tool's top bar. Return to the Tasks tab to continue to rate answers for questions and help the Watson service learn.
  6. As you continue to answer questions, the tool evaluates the best strategies for getting the best possible answers, and presents different tasks based on those strategies. For example, for a question whose answers all rated only one star, the tool might ask you to search for a better answer. In general, for the best results, after you complete a given task, you can check your progress, but then return to the Tasks tab and follow the tool's prompts and requests presented there.

Training and testing a ranker

  1. When you have answered enough questions to satisfy the tool's training requirements, the Tasks tab displays a Train a ranker dialog box. Click Train in the dialog box to begin training a ranker with the information you have provided.

    Note: Each ranker has an associated price, as does training a ranker. See your Bluemix dashboard for details.

  2. The tool displays the progress of the ranker's training. Training can take anywhere from a few minutes to over an hour, depending on the size and complexity of the training set. You can either wait for the ranker to finish being trained or perform other tasks on the Tasks tab.

  3. When the ranker's training is complete, a new task asks you to review the results. Click on the Performance tab. The tool shows you the accuracy of the new ranker compared to the accuracy of a base Retrieve (Solr) search and any previous rankers you have trained. Click on the ranker's bar chart to see more details about it.

  4. If you want to test your new ranker, click the Content tab, then click Try Out Watson to ask questions and receive answers from different rankers and the untrained Retrieve service.

  5. It is exceptionally rare for the first trained ranker to be accurate enough to become the production ranker. To improve future versions of the ranker, return to the Tasks tab to answer more questions. You can also add more questions to the question list and more documents to the collection.

  6. Iterate over these steps until you have a ranker with the accuracy your application needs. You can then continue to query the ranker by using either the tool or a Retrieve and Rank application.

Note: The Retrieve and Rank Web Interface uses a rows setting of 30. An application that uses the same ranker needs to use the same setting to ensure the consistency of ranked results. See Preparing training data for information about the rows command.

Performing other tasks

  • As described in Ranker implementation details, there is a limit to the number and size of rankers per cluster. To clean up older, less accurate rankers, go to the Performance tab, click on a ranker's bar chart, and then click Delete ranker. Even after the tool deletes a ranker, it retains a record of the ranker's efficiency so you can continue to compare new versions against previous versions.
  • At any point, you can review and manage your documents, questions, and answers by going to the Content tab.