Conversational search

Use conversational search with the search integrations such as Elasticsearch, Milvus, or use a Custom service to help your assistant extract an answer from the highest-ranked query results and return a text response to the user.

When you enable this feature, search results are provided to an IBM watsonx generative AI model that produces a conversational reply to a user's question.

The watsonx generative AI model is currently hosted only in the Dallas and Frankfurt regions. By default, assistants in all regions except Frankfurt use the model from the Dallas region. Assistants in the Frankfurt region use the model hosted in the Frankfurt region.

Before you begin

You must configure the search integration to enable the conversational search feature. For more information about configuring Elasticsearch integration, see Elasticsearch search integration setup.

Enabling conversational search

You can enable conversational search to give accurate responses to the customer query. In addition, you can enable citations by putting a citation title, which gives the list of references of the source content from where the assistant pulled the responses. You can see the citation title in between the conversational response and the citations.

To enable conversational search, do the following steps:

Go to Search Integration window.
Set the Conversational search toggle to On.
Choose the type of conversational search based on the context using contextual awareness.
- Single-turn conversational search
For contexts that require only current input to retrieve search results and to generate answers, choose Single-turn.
- Conversational search using entire conversation
For context-dependent questions, which might take previous inputs into account, choose Entire conversation.

Entire conversation uses the whole session to continue the conversation. It might bring back subjects that are no longer in the scope of the conversation.
In the Define the text for the citation title, type How do we know?.

The Define the text for the citation title is enabled only when Conversational search toggle is switched to On.

The web chat integration does not support the citation title feature.
Configure the values for Retrieval confidence threshold, Generated response length, and Response confidence threshold in the Search configuration section.

The retrieval confidence threshold determines the minimum level of confidence that is required for a model to retrieve information from a knowledge base or database. It can ensure that the assistant only fetches relevant information, improving the accuracy and reliability of its responses. For more information, see Retrieval confidence score.

The generated response length in a conversational search refers to the maximum number of characters or words that a model is allowed to generate in response to a user's query. For more details, see Tuning the generated response length.

The response confidence threshold sets the minimum level of certainty that is required for a model to generate a response. This setting ensures that the assistant provides accurate and reliable answers by responding only when it has a high degree of confidence in its output. For more information, see Response confidence score.
You can configure the number of citations that is displayed in the Citations section. For more details on citations, see Citations.
Click Save

Tuning the generated response length in conversational search

The generated response length feature in your assistant customizes response lengths to best meet your needs.

You can choose from three response lengths: concise, moderate, and verbose. This feature adjusts the length of the responses that your assistant gives to better fit your needs in conversational search. The default setting is moderate, but you can change it as needed:

Response length	Description
`Concise`	Responses are shorter and to the point, which is ideal for straightforward queries.
`Moderate`	Responses balance detail and conciseness, making them suitable for most general inquiries.
`Verbose`	Responses provide more detailed and comprehensive information that is suitable for complex queries or when a thorough explanation is needed.

The response-length feature affects the average length of responses that watsonx Assistant generates. Although it aims to match the specified length, actual responses vary because of the complexity of user input and the inherent limitations of the large language model (LLM).

Configuring your assistant to use the conversational search

After you enable Conversational search, you must configure the Search routing setting to route your assistant responses to conversational search when no action matches the user response. For more information about the Search routing configuration, see Configuring the search routing when no action matches. To configure your assistant to route to conversational search for specific topics or actions, you can add search as a step in a new or existing action.

When your assistant receives no search results from Elasticsearch in response to a user query or when its connection to Elasticsearch fails, your assistant responds to the user with a failure message. You can configure the failure messages for no search results and a failed connection in the Search integration settings.

Testing conversational search

You can test the conversational search in actions preview, the preview page, or by using the preview link.

In this example, the user asks, Tell me about a custom extension. Search results are pulled from your knowledge base when the conversational search is Off. In this case, the answer is returned as a list of cards that are relevant to custom extensions.

When the conversational search is On, the same search results are pulled from your knowledge base. The results are passed to an IBM watsonx generative AI model. This model produces a conversational reply to the user's question, in the form of a text response about custom extensions.

Debugging the failures in conversational search

If your calls to the Conversational search fail, you might want to debug the problem by seeing the detailed information about what is being sent to and returned from the system API.

For more information, see Debugging failures for conversational search

Streaming response for conversational search

Streaming response for the conversational search uses watsonx.ai capabilities to provide continuous, real-time responses in your assistant. By default, the streaming response is disabled for the web chat and the assistant preview panels.

By using the streaming response support feature, you can reduce the wait time for the response.

To enable streaming response, do the following:

Go to Home > Preview > Customize web chat.
Click the Styles tab.
Set the Streaming toggle button to On.
Click Save and exit.