Using the AlchemyLanguage Converter

The AlchemyLanguage converter can be used during any document ingestion (push or crawl) to enrich indexed documents with additional metadata. This metadata can be used for a variety of purposes in the context of search applications or other use cases that require a 360-degree view of an entity. The AlchemyLanguage API is a set of web services that enable advanced text analysis including natural language parsing (NLP), sentiment analysis, entity extraction, and concept identification.

About this task

The AlchemyLanguage API converter is intended to make it easier for Engine administrators to augment indexed content using AlchemyLanguage web services. This converter currently integrates six of the AlchemyLanguage functions: Entity Extraction, Sentiment Analysis, Keyword Extraction, Concept Tagging, Relation Extraction, and Taxonomy Classification. The full AlchemyLanguage API reference is available at https://www.ibm.com/watson/developercloud/alchemy-language/api/v1/. An augmented index might be useful in a variety of situations, for example, you could autoclassify documents using the AlchemyLanguage Taxonomy Classification or Concept Tagging functions. This taxonomy might be used for faceting, "find related" searches, display, or combined with other analytics.

Implementation considerations:

  • Privacy and Security - The AlchemyLanguage converter will require a web services call. This call is made over an HTTPS connection. If this level of security is not sufficient for your data, consider using other capabilities available in the analytical components of Watson™ Explorer, run completely on-premise.

  • Crawl performance - As the converter will be making web services calls to the AlchemyAPI cloud, document conversion will be slower compared to not using converters that call out to web services. We anticipate that conversion performance will be comparable to using a a local text analytics engine (for example the Content Analytics Annotation converter) but this is highly dependent on the network, data, and other factors.

  • Failures will happen - All distributed systems are inherently unreliable and failures will inevitably occur when attempting to call out to the AlchemyLanguage API. Carefully consider how failures should be handled at conversion time. Should the whole document fail? Is a partially indexed document without enriched metadata from the AlchemyLanguage API OK?

  • Managing development costs - Every document and every AlchemyLanguage function used counts toward your daily and monthly AlchemyAPI transaction limits. We've created a simple test server that can return a canned API response. This test server can be used during development to help manage development time and costs. Keep in mind that the current converter will make one API call per function.

  • Data Preparation - It is the responsibility of the caller to ensure that representative data is being sent to the AlchemyLanguage APIs. Additional data preparation may be required in some cases. For example, some Watson Explorer Engine converters will retain a subset of HTML tags such as PDFtoHTML or WordtoHTML. These tags provide hints to the indexer but in a content snippet would become encoded XML. This means <span> would count as 12 characters toward the AlchemyAPI string length limit - and 12 fewer characters of text might be representative of the document you're analyzing. Choose content carefully and be sure it is clean enough for analysis.

  • Caching Proxy - Given the considerations listed here, the use of a caching proxy is always recommended in a Production environment. Using a caching proxy can speed up refreshes and recrawls in some situations, overcome some network failures, and in some cases may also allow you to reduce the total number of required AlchemyAPI transactions.

Adding the AlchemyLanguage Converter

Procedure

  1. Navigate to the Configuration tab for the search collection that will be indexing the documents you want to enrich with AlchemyAPI text analysis.
  2. Select the Converting sub-tab.
  3. Click the Add a New Converter button. The Add a New Converter dialog displays.
  4. Choose AlchemyLanguage Converter from the list and click Add.
  5. Click OK.

What to do next

After the AlchemyLanguage Converter is added, you need to configure it.

The AlchemyLanguage converter requires a valid API key for use. See IBM Bluemix for more information. Individual Alchemy functions must be enabled and are each individually configured. This allows you to, for example, use one set of content in one function and different content in another function.

The following fields must be configured:

AlchemyAPI base URL - REST endpoints for all API requests will be built from this starting point.

AlchemyAPI API Key - This is the key, provided to your organization by AlchemyAPI, an IBM company, that grants access to the AlchemyAPI services.

Default timeout - This sets the default amount of time, in seconds, to wait for a response from the API call. This value can be overridden for individual services in their specific configuration.

Proxy server (host:port) - This sets a proxy server through which all AlchemyAPI calls will be directed. It is recommended that a local caching proxy be used to improve performance and reduce the total number of AlchemyAPI calls.

Proxy username and password - Provide the credentials used to authenticate to the proxy server configured above.

Enable debug contents - If debug contents are enabled, then if any enabled AlchemyAPI service returns an error, content containing the error message will be added to the document. This is useful for debugging, but it may not be as valuable in your final crawl.

The results of the AlchemyLanguage API request will be stored as XML nodes within the VSE document. Click the Test it button to view the specific output of the converter. One or more VSE content elements are added to a document for each AlchemyLanguage service enabled. Consider post-processing these elements in a subsequent custom converter if you would like to modify the data indexed.