Overview

IBM® Network Intelligence transforms networks into intelligent systems by ingesting and processing network data at scale to generate meaningful observations. Network Intelligence then leverages AI agents that use the knowledge and tools at their disposal to process those observations and generate actionable insights.

Reason for Development

Network Intelligence is an AI-powered networking solution that is built on the advanced watsonx AI platform. Network Intelligence currently supports SaaS and delivers network insights through multi-model analysis, agent-based automation, and adaptable data management to boost operational efficiency. Its distributed analytics architecture ensures scalable and efficient data ingestion and maintains a streamlined system design.

Critical networks face persistent challenges due to increasing complexity across multiple domains, manual lifecycle operations, diverse metrics data, and overlapping generations of technologies.

Now, the situation reaches a breaking point, unable to support the demands of cloud transformation and application modernization, not to mention the growing need for cost-efficiency and value generation.

Critical networks face persistent challenges due to increasing complexity across multiple domains, manual lifecycle operations, diverse metrics data, and overlapping generations of technologies.

Solution

Figure 1. Network Intelligence architecture
A figure of Network Intelligence high-level architecture

The architecture for Network Intelligence is made up of two high-level components:

Analytics: which deals with the ingestion and processing of network data at scale to generate meaningful observations.

Automation: where AI agents process observations in an autonomous fashion, and use the knowledge and tools at their disposal to generate actionable insights.

The Analytics component consists of distributed nodes for data ingestion and anomaly detection, a centralized node for multi-variant analysis, and a unified API gateway for external integrations. Its real-time processing pipeline enables efficient data handling and insight generation. The diagram shows a functional description of the architecture and data flow from external systems through adapters into distributed nodes, where it is ingested and preprocessed. Anomalies are detected and clustered, then forwarded to the central node for deeper analysis. Observations are generated and transferred with APIs and UI components for decision-making and automation, both in assistant and autonomous modes.

Figure 2. Analytics architecture
A figure of the Analytics architecture
Network Intelligence supports metrics data that are organized in triplets (device, component, and indicator). Since each device can include multiple components (such as interfaces in a router), and each component can have several health or status indicators, the model captures information at a fine-grained level. A single distributed node can process over one million indicators every five minutes, and this capacity can be expanded horizontally by adding more nodes.

During ingestion, the system standardizes incoming data and, where available, extracts metadata that describes device topology and vendor-specific component ontologies. Indicators pass through a sequence of models that act as filters allowing only the noteworthy patterns to reach later stages for deeper analysis. Independent of the main Anomaly Detection (AD) flow, indicators are rebaselined and categorized, allowing the AD pipeline to select the most appropriate models for different indicator types. When anomalies are detected, they are compared with other active observations and merged based on similarity and temporal proximity.

For the Automation component, AI agents act immediately when new observations arise. They correlate the data with domain knowledge, analyze it against metrics and descriptions, and apply retrieval-augmented generation (RAG) to propose potential root causes and remediation actions. These candidates are validated using external system data-retrieved with function calls before scored for plausibility. The system then presents users with ranked causes and recommended remediation steps as actionable insights.

Underlying these agents are large language models (LLMs), reinforced for production through scaffolding. This includes protective guardrails for user interactions and the use of curated, standardized prompts to ensure reliable, consistent outcomes.

Figure 3. Automation architecture
A figure of Automation architecture

Observations are added to the queue of an agent from an agent pool to which they are assigned in real time. Each agent can handle tens of observations per hour, and the observation processing capacity can be increased through the addition of more agents. Each Network Operations Center (NOC) agent is in itself a multi-agent system, consisting of a reasoning agent, a RAG agent, and a function calling agent.

The Reasoning Agent explicitly implements the React pattern on the LLM using chain of thought and dispatches certain tasks to the available agents.

The RAG Agent can access the knowledge base documents that are uploaded in the Network Intelligence through Grounding documents section in the UI. The embedding model chunks and process the documents to generate vector embeddings that are stored in the corresponding vector DB and then used to search for relevant documents. The agent contains a preprocessing step where it generates synthetic queries to retrieve all relevant information, using both keyword-based and chunk-based searches.

The Tool-calling Agent has access to tools in the form of API function calls to external systems. Tools are also onboarded through the Network Intelligence admin portal, by specifying openAPI specification, endpoint and credentials. The function-calling agent is also instructed to think in React patterns to generate multi-step plans with separate function calls at each step, which are often dependent on the results of previous calls. OpenAPI specifications define APIs from the external systems directly or implemented through Pliant or Concert workflows. Current integrations with external systems include ServiceNow with the Pliant workflows.

Other Agents The answer agent is responsible for providing the final answer to the user, combining all the actions that were done. It is possible to perform this by separate models in different languages. The links agent is responsible for converting parts of the response into a structured text tag, which the UI can interpret as an internal UI link. This allows for easier navigation in the UI for the users. You can view the analysis of the agents in the assistant mode through a chat interface and in autonomous mode through progress reports. You can access the chat interface and progress reports in the Network Intelligence UI and integrated ITSM systems such as ServiceNow.

Key features include:
  1. Network-Centric Models: predictive AI and network foundation models to transform network lifecycles.
  2. Network Semantic Contexts: interprets data across various domains and vendors, with semantic understanding of networks to ground the models.
  3. AI Assistants and Agents: goes beyond traditional rule-based systems with goals and guardrails automation, enabling scale and autonomy. It seamlessly integrates with existing systems and workflows to enhance operational efficiency.
  4. Data Collection and Ingestion: Network Intelligence includes a robust automated data pipeline to collect and process data in real time or ingest data from existing data lakes, regardless of data cleanliness, ensuring flexibility and adaptability in diverse environments.

Value

The value proposition focuses on improving network and infrastructure resilience and performance to support app modernization and cloud adoption.

Suggested improvements include:
  • 4x improvement in resilience through incident prevention and resolution.
  • 10x operational efficiency, enabling human teams to focus on value creation instead of maintenance.
  • App-centric infrastructure right-sized to applications, for CapEx efficiency improvements.