Exporting a UIMA pipeline for domain adaptive search

You can export a UIMA pipeline for domain adaptive search to generate queries according to query context and domain knowledge. Based on rules that you specify in the UIMA pipeline, the Watson Explorer Content Analytics search processes can alter the original query terms, generate suggested queries, and group results.

Before you begin

Before you can export a UIMA pipeline for domain adaptive search to Watson Explorer Content Analytics, you must configure a Watson Explorer Content Analytics server connection file.

About this task

After you develop and export your UIMA pipeline for domain adaptive search, you can apply it to a result group when you configure search quality management options for an enterprise search collection in the Watson Explorer Content Analytics administration console. At runtime, if an annotation that is created by the pipeline is detected, any domain adaptive search rules that are defined for the annotation are applied. For example, you might specify domain adaptive search rules so that a query for IBM® runs a group query for “Big Blue” OR “International Business Machines” in addition to the original query, and that the results from the group query take priority over the results from the original query.

You can configure dynamic rules by using feature values as the query to run or suggest. For example, you might want to run the query IBM OR AIX if the original query is IBM, and run the query Microsoft OR Windows if the original query is Microsoft. In this case, you created a dictionary with the entries (surface:IBM, OS:AIX) and (surface:MS, OS:WINDOWS). You then created a parsing rule to concatenate the covered text and the OS value of the dictionary entry with OR, and save the new values as a feature when a text contains a surface string such as IBM. When you configured the domain adaptive search parameters for the corresponding annotation, you selected this feature as the search query to run.

Tip: Because UIMA pipelines that you use for domain adaptive search are run every time that a query is submitted, the search runtime performance can be impacted if the annotator takes a long time to process the query text. Before you export your pipeline to Watson Explorer Content Analytics, verify the performance of the pipeline in Content Analytics Studio.

You can specify rules to display a suggested query instead of or in addition to running a modified search query. For example, for an original query cut wood that has multiple contexts, you might configure the system to run queries for jigsaw and chisel, and display the query suggestions how to create a dog house and buy firewood.

Procedure

To export a UIMA pipeline for domain adaptive search to Watson Explorer Content Analytics:

  1. From the Configuration/Annotators directory of your project, right-click your ANNOCONFIG pipeline configuration file, click Export, and click Content Analytics Studio > UIMA Pipeline to Watson Explorer Content Analytics for Domain Adaptive Search.
  2. Specify a name and temporary location for the PEAR file that is created on the file system before it is uploaded onto the Watson Explorer Content Analytics server. By default, the PEAR file is exported to the Content Analytics Studio workspace directory.
  3. Select the Watson Explorer Content Analytics connection file that defines the server to which you want to export the pipeline.
  4. Configure the domain adaptive search parameters for one or more UIMA types that are generated by the UIMA pipeline. Click Add to select an annotation type, and then specify the parameters for that type. You can specify a literal string or select a UIMA feature value for each parameter, except for the group ID that must be a string that does not contain any special characters. For each annotation type, you must set a group ID and at least a search query or a suggestion query.
    Tip: The UIMA pipeline can contain one or more result groups, and each group can contain one or more UIMA annotation types. If you want to include multiple annotations in the same result group, ensure that you specify the same group ID for each annotation type. Otherwise, different groups are created.
    Search query parameters

    When you specify a search query to run if the annotation is found in the original query text, you can also specify a description for the results group. When the group results are returned in the user application, a link More results from the same group is displayed and the specified description of the results group is displayed when you hover over the link.

    Suggestion query parameters

    When you specify a query to suggest if the annotation is found in the original query text, you can also specify the label to display for the suggestion. For example, for the original query dog, you specify pets as the suggestion label and cats OR fish as the suggestion query. The label text is displayed as a suggestion under the query entry field in the search application. If you click the link for that suggestion label, the suggestion query text is displayed in the query entry field and the query is run.

    For query suggestions, you can also specify values that can be returned in a REST API response to custom applications. You can specify different suggestion types such as Suggestion A and Suggestion B to distinguish between suggestions and select which type of suggestion is used in the custom application. You can also specify the origin of the suggestion to indicate what term in the original query produced a specific suggestion.

    Priority parameters

    You can specify the priority of a query within its group, and the priority of this group in relation to other groups. Queries and groups with higher priorities are processed earlier and their results are returned first. If multiple groups have the same priority, the group ID is used as the second sort key. For example, if the ID of group 1 is ab and the ID of group 2 is aa and the priority of both groups is 1, then group 2 is processed first. The group priority value can be from -1000 to 1000. The group priority that you set for an annotation is a dynamic priority and overrides the priority that is set in the Watson Explorer Content Analytics administration console. If you do not set a dynamic group priority, the static priority that is set in the administration console is used.

    By default, the priority of the original query is 0. If you want group query results to be returned before results from the original query, you must set a group priority that is higher than 0.

    Other parameters

    For search and suggestion queries, you can specify properties to configure the queries. For search queries, you can specify ExactHighlighting or GreedyHighlighting to configure how much text is highlighted in the results. You can specify EnableStopword or DisableStopWord to configure whether stop words are removed from the queries. To specify multiple properties, separate them with a space character. For suggestion queries, you can specify the property DisableResultGroup to disable result group options for the suggestion query.

    You can also specify whether queries from the same UIMA type are merged to a single query, and whether the UIMA type is enabled only when the original query is a plain text query that does not contain any special characters.

  5. Specify a display name to use for the text analysis engine in the Watson Explorer Content Analytics administration console.

What to do next

After the PEAR file is installed in Watson Explorer Content Analytics, you can apply the domain adaptive search annotator to a result group on the Search Quality Management page of the Watson Explorer Content Analytics administration console.

If you want to reinstall the pipeline after you modify the linguistic resources in Content Analytics Studio, you must specify a different name in the Text Analysis Engine Name field when you install the updated pipeline. If you want to use the same name when you install the updated pipeline, you must first manually disassociate the existing version of the text analysis engine from the Watson Explorer Content Analytics collections and delete that version of the text analysis engine from Watson Explorer Content Analytics.