Overview of the Retrieve and Rank service

The IBM Watson™ Retrieve and Rank service combines two information retrieval components in a single service: the power of Apache Solr and a sophisticated machine learning capability. This combination provides users with more relevant results by automatically reranking them by using these machine learning algorithms.

How to use the service

The following image shows the process of creating and using the Retrieve and Rank service:

Process to use the Retrieve and Rank service

For a step-by-step overview of using the Retrieve and Rank service, see the Tutorial page.

Quick links

Technologies

The purpose of the Retrieve and Rank service is to help you find documents that are more relevant than those that you might get with standard information retrieval techniques.

Retrieve: Retrieve is based on Apache Solr. It supports nearly all of the default Solr APIs and improves error handling and resiliency. You can start your solution by first using only the Retrieve features, and then add the ranking component.

Rank: The rank component (ranker) creates a machine-learning model trained on your data. You call the ranker in your runtime queries to use this model to boost the relevancy of your results with queries that the model has not previously seen.

The service combines several proprietary machine learning techniques, which are known as learning-to-rank algorithms. During its training, the ranker chooses the best combination of algorithms from your training data.

Primary uses

The core users of the Retrieve and Rank service are customer-facing professionals, such as support staff, contact center agents, field technicians, and other professionals. These users must find relevant results quickly from large numbers of documents:

  • Customer support: Find quick answers for customers from your growing set of answer documents

  • Field technicians: Resolve technical issues onsite

  • Professional services: Find the right people with the right skills for key engagements

Benefits

The Retrieve and Rank service can improve information retrieval as compared to standard results.

  • The ranker models take advantage of rich data in your documents to provide more relevant answers to queries.

  • You benefit from new features developed both by the open source community and from advanced information retrieval techniques that are built by the Watson algorithm teams.

  • Each Solr cluster and ranker is highly available in the Bluemix environment. The scalable IBM infrastructure removes the need for you to staff your own highly available data center.

About Apache Solr

As previously mentioned, the Retrieve part of the Retrieve and Rank service is based on Apache Solr. When you use Retrieve and Rank, you need to be knowledgeable about Solr as well as about the specifics of the Retrieve and Rank service. For example, when Solr passes an error code to the service, the service passes it to your application without modification so that standard Solr clients can correctly parse and act upon it. You therefore need to know about Solr error codes when writing error-handling routines in your Retrieve and Rank application.

To learn about Solr, see the Apache Solr Resources page. The page provides links to resources including a quick start tutorial; documentation and books; and forums for discussion, advice, and problems.

About the sample application

A sample application based on the Retrieve and Rank service is discussed at Retrieve and Rank application overview and available to try at Professor Languo. You can also download the application's source files from GitHub. The application enables you to ask questions about standard English usage, with answers processed by a back-end Retrieve and Rank instance from data provided by the Stack Exchange English Language and Usage forum.

About the free Retrieve and Rank instance

The free cluster you can create to test the Retrieve and Rank demo application is a single reduced-size unit consisting of a maximum of 50 MB of disk storage. It does not guarantee any specific amount of RAM. The free cluster is meant only to run the demonstration application or small proof-of-concept applications. It cannot be used as a unit in a paid Retrieve and Rank cluster. It is not intended for production use. See Sizing your Retrieve and Rank cluster for more information.

Your questions and feedback

We are always looking to improve and learn from your experience with our services. Find answers to your questions about Watson in our developer communities:

  • For programming questions, to go the Watson forums on StackOverflow.

  • For questions and comments about Watson products and services, go to the Watson forum on dwAnswers.

  • To read posts about Watson services from IBM researchers, developers, and other experts, go to the Watson Blog.