A multi-tiered support system in an organization can use a Large Language Model-powered conversational assistant, or chatbot, alongside human agents, offering efficient and comprehensive assistance to end-users.
The architecture for conversation with agent assist is shown in the diagram above. Major steps in the architecture flow are:
Enterprise documents such as product manuals, frequently asked question documents, offering materials, prospectuses, resolved support tickets, and others are ingested into an instance of IBM watsonx Discovery and prepared for semantic searching.
Users submit requests, issues, or questions through an interface on the organization's website, a dedicated app, or other platforms. This interaction is facilitated by IBM watsonx Assistant, which acts as the primary interface for chat-based interactions.
For requests that require data retrieval from the organization's documents or knowledge base, IBM watsonx Discovery is called to search for and retrieve the passages of information most relevant to the user's request.
watsonx Assistant then submits the user's request and the relevant information retrieved from watsonx Discovery to a large language model (LLM) hosted on watsonx.ai.
The LLM synthesizes the user's request and the supplied information along with the LLM's embedded knowledge and generates a human-like response that is passed back to watsonx.ai which, potentially after formatting and other processing, is presented to the user.
If the user is not satisfied with the generated response (for example, their request is nuanced, complex, or requires specific knowledge), they may elect to have watsonx Assistant escalate the call to a human agent. Similarly, interactions may be automatically escalated if the LLM's response is detected to be low-confidence, or potentially offensive. Users can opt to interact with a human representative at any juncture. watsonx Assistant smoothly transitions the interaction to a human agent via the enterprise's contact center management system.
A human agent, with full access to the watsonx Assistant chat history, assists the user with resolving their request, issue, or question.
Post-resolution, the system, through watsonx Assistant, can solicit user feedback. This feedback assists in refining future interactions by analyzing frequently missed or escalated queries and enabling the organization to tune the LLM hosted on watsonx.ai and/or tweak watsonx Discovery's search parameters to enhance performance.
The mapping of the IBM watsonx family of products to the conceptual architecture is shown in the diagram below. watsonx Assistant provides the interaction capabilities of the Virtual Assistant component, while watsonx Discovery, an add-on to watsonx Assistant, provides document ingestion and semantic searching capabilities. The watsonx.ai model development and hosting environment is used to select, tune, test, and deploy the large language model.
Some clients do not have watsonx.ai available in their local region, or may have security concerns or regulatory requirements that prevent them from using the watsonx.ai SaaS solution. For these clients, we offer watsonx.ai as a set of containerized services that can be deployed on Red Hat Openshift running within the clients' data centers, within a virtual private cloud (VPC) on a cloud-service provider's infrastructure, or other location.
Many factors go into choosing a models that will work well for your project.
The model's license may restrict how it can be used. For example, a model's license may prevent it from being used as part of a commercial application.
The data set used to train the model training has a direct impact how well the model works for a specific application and significantly affects the risk that the model may generate non-sensical, offensive, or simply unwanted responses. Similarly, models trained on copyrighted or private data may open their users to legal liability. IBM provides full training data transparency and indemnification from legal claims arising from its models.
The size of the model, how many parameters it is trained with, and the size of its context window (how long of a passage of text can the model accept) affect model performance, resource requirements, and throughput. While it's tempting to go with a "bigger is better" philosophy and choose a 20 billion parameter model, the resource requirements and improvement (if any) in accuracy may not justify it. Recent studies have shown that smaller models can significantly outperform larger ones for some solutions.
Any fine-tuning applied to a model can affect its suitability for a task. For example, IBM offers two versions of the Granite model: one tuned for general chat applications, and another tuned to follow instructions.
Other considerations when choosing a model include:
Selection of model parameters, eg. the model temperature, to balance the creation of human-like text and factual responses. Setting the model temperature to a high value will generate consistent but potentially uninteresting or overly terse responses, while setting the temperature to a low value will introduce more variety into the responses but will add unpredictability in the response length and content.
Selection and implementation of model guardrails to guard against ineffective or offensive results.
The language of the client data and user prompts must also be taken into account. The majority of LLMs are trained on English language text and can often translate between English and other languages with varying levels of expertise. Applications requiring multi-lingual or localized language support may require the use of multiple models trained in each of the supported languages, or implementation of a translation step to translate multi-lingual inputs into English or another 'base' language.