Architecture

The objective of the Content-Aware Storage (CAS) architecture is to facilitate smooth interactions between LLMs and extensive quantities of unstructured data, amplifying insight derivation and suggestion functionalities.

A architecture diagram showing how CAS service works end to end

The IBM Fusion CAS service uses IBM Storage Scale CNSA to remote mount a Storage Scale cluster. This cluster can be Storage Scale software based, or Storage Scale Server 3500 or 6000 based. PDF documents stored on the remote IBM Storage Scale cluster are automatically processed by CAS for by the NVDIA NeMo Retriever Extraction microservices which extract text, tables, and charts1. The parsed content and metadata is normalized and stored along with vector embeddings in the vector database managed by CAS.

IBM Storage Scale watch folders are leveraged to detect changed data in the remote IBM Storage Scale filesystem, which triggers ingest processing by CAS of just the changed content.

AFM can optionally be used to perform ingest processing of PDF documents stored in external S3 data sources by CAS, facilitating their use in RAG applications. CAS performs a scan of the AFM-S3 fileset automatically to determine the data that needs to be ingested.

The CAS service provides a search API that supports semantic search, BM25 keyword search, and hybrid search (semantic + BM25 keyword).

Customer applications such as a chat bot can send natural language searches to the CAS search API. Upon receiving the request, CAS creates a vector representation of the natural language search and performs a query against the vector database using the search method specified in the API (semantic, keyword, or hybrid). The top-K results are sent to a CAS aggregator component which uses additional metadata to optimize the search results. The optimized results are passed to a reranking function, which performs additional optimization of the results before returning them to the search API.

In a customer provided RAG chat bot, the search results provided by the CAS search API can be combined with the original prompt or question and sent to an LLM to generate the answer provided to the end user.

Note: CAS leverages natural language processing (NLP) and AI technologies to extract semantic meaning from only unstructured text data.
Important: To know about the resource usage like, vCPU, memory, storage, and GPU, see System requirements.