Using the Web Services Converter

Web services often offer text analysis functions that can enrich your documents. The Web Services converter accesses web service functions via their REST API and adds the response to your document, which you can optionally post-process in your own custom converter.

About this task

The Web Services converter is used to augment data with content and analysis from an external web service while that data is being ingested. It is generalized for use with nearly any web service. All Watson™ Explorer converters (including this one) process the text to be indexed at ingestion time. When configured properly, the Web Services converter sends administrator-defined name-value pairs as CGI parameters to a REST-based web service. The response from the REST web service is then stored in its entirety in a new administrator-specified <content> element. The Web Services Converter is designed to work with two custom converters that handle:

  • The pre-processing of text to prepare the name-value pairs for consumption by the web service. The output of the custom converter responsible for pre-processing should contain a <content> element for each CGI parameter to be sent to the configured web service. The name and value of the <content> element is sent as a CGI parameter name and value.
  • The post-processing of the web service response. The complete web service response is stored in a <content> element. Under most circumstances, the web service response needs to be processed and transformed into Watson Explorer Engine XML (VXML) to be useful. It is also likely that some <content> elements can be discarded at this time, like the <content> element that contains the original web service response and any <content> element whose only purpose was to represent a CGI parameter for the web service call.
Note: IBM recommends complementing this out of the box architecture with a caching proxy. Routing web service calls through a caching proxy can speed up refreshes and recrawls in some situations, overcome some network failures, and might also allow you to reduce the total number of web service transactions.

For an example of the Watson Explorer Web Services converter that is configured for use with the Watson Developer Cloud Relationship Extraction service, see the wex-web-services-converter example on GitHub.

More Watson Explorer cloud integrations can also be found on GitHub.

Implementation considerations:

  • Security - At the very least, you should ensure that the web service you are using can be accessed via an encrypted HTTPS connection if you might be crawling any data of a sensitive nature. There are additional security options in the Advanced HTTP Config section of the Web Services converter configuration that will allow Watson Explorer Engine to establish a secure channel, authenticate, and so on. For more detail, see the tooltip for each setting.

  • Performance - Adding the Web Services converter can significantly impact the collection's total crawl time. Factors include: the size of the web service request, the size of the web service response, the web service processing time, latency, and bandwidth of the networks connecting the Watson Explorer server and the web service endpoint, etc. As mentioned earlier, a caching proxy likely increases crawl performance if there is any chance of duplicate web service calls during crawling or subsequent refreshes.

  • Failures happen - All distributed systems are inherently unreliable and failures inevitably occur when calling out to a web service. Carefully consider how failures should be handled at conversion time. Should the whole document fail? Is a partially indexed document without enriched metadata OK for your use cases?

  • Data Preparation - It is the responsibility of the caller to ensure that representative data is being sent to a web service. Data preparation strategies that are not demonstrated here might be required in some cases. For example, some Watson Explorer Engine converters produce HTML tags in the "snippet" <content>, such as PDFtoHTML or WordtoHTML. These tags provide hints to the indexer, but in a snippet become encoded XML. Prepare your data carefully and ensure it is clean enough for your web service.

  • Scalability - Some web services enforce limits on usage. Such limits might include metrics like calls per day, data per call, number of simultaneous calls, and so on. Carefully consider the demand that you are placing on the web service. You might need to reduce the aggressiveness of the Watson Explorer Engine crawl, reduce the number of converters that might be running simultaneously, or reduce the number of simultaneously active crawls in order to stay within the operational limits of the web service or within the limits of your license to use it.

Adding the Web Services Converter

Procedure

  1. Navigate to the Configuration tab for the search collection that is indexing the documents that you want to augment with content from an external web service.
  2. Select the Converting sub-tab.
  3. Click the Add a New Converter button. The Add a New Converter dialog displays.
  4. Choose Web Services from the list and click Add.
  5. Click OK.

What to do next

After the Web Services Converter is added, you need to configure it.
  1. In the Web Service endpoint URL text box, supply the endpoint URL for your external web service.
  2. In the Contents to send as name/value pairs text area, enter the name of each CGI parameter you want to send to the web service on its own line.
  3. Click OK.

After successfully completing these tasks, you must create two custom converters. The first is one for preprocessing text. See Creating the Custom Converter for Pre-Processing Text.