Agentic AI

Overview

Agentic AI systems bring together the versatility and flexibility of large language models (LLMs) and the precision of traditional programming models. Agentic AI systems are able to autonomously plan and perform tasks on behalf of a user or another system. Agentic AI systems solve complex problems by breaking them down into series of smaller tasks and using available tools to interact with external systems, or perform computational tasks.

These capabilities make agentic AI systems capable of handling a far greater range of tasks and far more complex tasks than just LLMs alone. For example, if you were to prompt a LLM to recommend which car to buy the model would dutifully generate a list of recommendations based on the data available at the time the model was trained. On the other hand, an agentic AI solution could prompt you for additional details on how you intend to use the vehicle (pleasure, commuting to work, hauling heavy loads), and let you know there is a manufacturers rebate available until the end of the month.

Generative AI architecture patterns

Conceptual Architecture

An agentic AI system is comprised of the following components:

An Agent Orchestration component manages, and coordinates the actions of a set of Agents. The Agent Orchestration component may make use of a LLM to break down and dynamically generate workflows to solve complex tasks, or it may solely use statically defined workflows defined using technologies like Business Process Modelling Notation (BPMN), Business Process Execution Language (BPEL), or other workflow technologies.
One or more Agents, pieces of software that can self-determine and execute actions to meet specified goals. Agents typically use a LLM to dynamically generate plans to complete tasks. Agents can also make use of Tools to interact with external systems, eg. an enterprise application API, search knowledge stores, eg. query Wikipedia, or to carry out calculations, eg. mathematical operations, that cannot be done accurately or effectively using an LLM alone.
Finally, the tools interact with enterprise and external sources and systems to retrieve information, and update systems of record.

Agents have their own conceptual architecture, illustrated in the figure below.

Agents are comprised of the following core components:

The Input component is one or more sources of input that trigger the agent to take action. Commonly this is a natural language query or task from a user but it could also be a system event, such as the creation of a file, a message on a Kafka queue, or a structured API call.
The Execution component coordinates the agent's activities to carry out the required task. Commonly, the first task executed by the Execution component is to (i) marshal a list of the tools and resources available to the agent, and (ii) invoke the Planning and Reflection component to generate an activity plan to carry out the task. The Execution component then executes the generated plan, invoking tools and resources as necessary to collect information or alter the agent's external environment; and may periodically re-invoke the Planning and Reflection component to adapt the activity plan depending on tool responses / failures.
The Planning and Reflection component, commonly an LLM, enables the agent to create step-by-step action plans to accomplish a task in response to its inputs, and to reflect on the results of actions and to adapt their plans in response.
The Tool Integration component enables the agent to use 'tools' to call APIs and access resources to complete actions and gather information to contribute to the completion of the overall task.
The Memory component manages short-term, in-task, context, as well as long-term knowledge that enables the agent to both maintain context across task invocations (e.g., "Reverse the last purchase order") and to provide a foundation for analysis of past actions and optimization of future actions.

Additional components, not shown in the figure, can be added to provide operational agent management, performance monitoring, and security controls such as identity propagation and data leakage prevention.

Conceptual Walkthrough

The diagram below illustrates the flow of control and information through the conceptual architecture.

A user submits a query to a generative AI application (for example, a chatbot, or a query interface within an enterprise application)
The generative AI application passes the user's query to the Agent Orchestrator in the form of either the raw query, eg. the AI application is chat interface, or an the triggering of a pre-defined workflow, eg. the initiation of a purchase requisition. A raw query will be assumed for the walkthrough.
The router uses a tuned LLM to break the user query down into a series of actions, or steps, necessary to arrive at an answer. For example, to answer the query "What is the current temperature in Winnipeg, Manitoba, Canada? How does that compare to the historical average for this time of year?" the LLM may respond with the following conceptual list of actions:
- Look up the current temperate for Winnipeg using the Weather agent
- Look up the current date using the Calendar agent
- Look up the average temperature in Winnipeg on this date using the Search agent
- Find the difference between current temperature and the historical average using the Calculator agent
- Formulate a natural language response using the Language agent
The Orchestrator then invokes the appropriate agent for each action in the list. Continuing with the example from Step 3:
- The Orchestrator invokes the Weather agent to retrieve the current temperate for Winnipeg, -1°C.
- The Orchestrator invokes the Calendar agent to get the current date, November 9, 2023.
- The Orchestrator uses the Search agent to find the normal temperature in Winnipeg on November 9, 1.4°C.
- The Orchestrator invokes the Calculator agent to find the difference between the two temperatures, -1 - 1.4 = -2.4
- The Orchestrator uses the Language agent to formulate a response to initial query using the gathered data
When an agent is invoked it may, like the Orchestrator, use a LLM to plan its actions. Continuing with the example, the Weather agent would receive the request "What is the current temperature in Winnipeg?", for which it would generate the following plan:
- Look up in which country Winnipeg is located
- Look up the authoritative national weather service for Winnipeg's country
- Use the Weather API to query the weather service for the current temperature in Winnipeg.
- The agent would then look up the country in which Winnipeg is located (Canada) using either an LLM or an external service, use that value to look up the national weather service for Canada (Environment Canada), and use the Weather API to get the current temperature for Winnipeg.
The resulting response is then passed back to the generative AI application; in our example "The current temperature in Winnipeg is -1°C. That is 2.4°C cooler than the historical norm of 1.4°C".
The formulated response is passed back to the user.

IBM Product Architecture

The diagram above illustrates the mapping of IBM products to the agentic AI architecture.

watsonx Orchestrate is an 'all-in-one' agentic AI solution that combines:

publication and management of tools (called skills in watsonx Orchestrate);
composition of skills into complex, multi-step processes using declarative workflows; and
pre-built domain specific agents for horizontal business areas such as HR and Purchasing.

The watsonx.ai Agent Builder is a low-code / no-code tool that enables developers to build agents, and define and manage tools using pre-built flows.

Architecture Decisions and Considerations

Orchestration Strategy

Agent orchestration can be implemented using a variety of approaches. A centralized orchestration approach uses a single master orchestration component to manage the actions of all the other agents in the system. Having a single point of configuration and management makes the overall system simple to manage and control, easy to troubleshoot. The downside is that a single point of control can become a bottleneck and lead to scalability challenges as request volumes and/or the number of agents increases.

A decentralized orchestration approach implements a task queue which agents pull tasks and post results, and routes multi-part tasks amongst themselves; similar to a blackboard system. Decentralized orchestration solutions are highly robust and fault tolerant but are difficult to design and troubleshoot as the systems become larger with greater capabilities.

Finally, a hierarchical orchestration approach combines elements of the centralized and decentralized approaches. In hierarchical orchestration, a master orchestrator is used to coordinate the actions of high-level agents that in turn can invoke other agents to complete complex tasks. This retains much of the ease of management and control of a centralized approach but reduces the potential for the central control component to become a bottleneck at high request volumes and/or large numbers of agents.

Agent Granularity

Granularity of an AI agent refers to the complexity of the tasks the agent can perform. A high-granularity agent may be capable of performing many tasks or a small number of tasks in great detail, whereas a low-granularity agent may only be capable of accomplishing a small number or even just a single task to a low level of detail. To make this clearer, consider a customer service agent. A low-granularity agent may be able to only answer simple questions about a product (e.g., "Does it come in black?"), whereas a high-granularity agent may be able to check local inventories and arrange to deliver the product to the customer's home.

Designers of agentic solutions must decide how granular to make the individual agents within the system, e.g., have a small number of high-granularity agents or a larger number of low granularity agents. The broad capabilities of high-granularity agent come at a cost of greater computing resource requirements and longer task completion times. While less capable, the narrow focus of low-granularity agents means they require less computing resources and will generally complete tasks much faster.

While the 'right' level of granularity is still unknown, early experience suggests creating low-granularity agents aligned to focussed business processes, e.g., Purchase_Order_Processing_Agent, produces a good balance between resource requirements, processing speed, and solution complexity. The low-granularity agents can then be incorporated into static workflows, or invoked by high-granularity agents as part of a larger process.

Static vs Dynamic Workflows

Designers of agentic AI solutions must strike a balance between agents following pre-defined, static processes and workflows, and having workflows dynamically generated in response to user prompts. While there is no right or wrong answer architects are advised to take the following recommendations and considerations into account:

Static workflows should be used for business processes made up of multiple complex steps that cross knowledge domains (eg. legal and accounting), or that are subject to regulatory oversight. Using static workflows in these instances provides architects with several benefits:
- Static workflows are (relatively) simple to instrument, monitor, and audit, and the workflows themselves can be used as evidence of regulatory compliance. Dynamically generated workflows are more difficult to monitor as they execute and individual process executions must be reconstructed from individual agent logs. Dynamic workflows also have the potential to vary the sequence of tasks which further complicate audit and compliance monitoring.
- Having well-defined 'hand-offs' between areas of expertise provides clear decoupling of responsibility and makes it easy to ensure that passed information is complete and correct. While the same can be accomplished with a dynamically generated workflow it requires more attention in design and implementation to accomplish
Dynamic workflows should be used for 'single step' activities or functions that are performed close together in time and do not cross knowledge domains and whose execution is not subject to regulatory oversight or controls.