Architecture

The objective of the Content-Aware Storage (CAS) architecture is to facilitate smooth interactions between LLMs and extensive quantities of unstructured data, amplifying insight derivation and suggestion functionalities.

A architecture diagram showing how CAS service works end to end

The IBM Fusion CAS service uses IBM Fusion Global Data Platform service to remote mount a IBM Storage Scale cluster. This cluster can be Storage Scale software-based, or Storage Scale Server 3500 or 6000-based. CAS automatically processes files that are stored on the remote IBM Storage Scale cluster into parsed content, and metadata is normalized and stored along with vector embeddings in its managed vector database. Document processing is available with IBM’s Docling Multimodal services or NVIDIA NeMo Retriever Extraction microservices.

IBM Storage Scale watch folders are leveraged to detect changed data in the remote IBM Storage Scale filesystem, which triggers ingest processing by CAS of just the changed content.

Files stored in external S3 data sources are ingested through Active File Management (AFM) running in Global Data Platform. No external IBM Storage Scale filesystem is required. CAS performs a scan of the AFM-S3 fileset automatically to determine the data that needs to be ingested.

The CAS service provides a search API that aggregates semantic and keyword search to return the most accurate results. NVIDIA's reranking is an optional search component that can be added to provide more accuracy.

Customer applications such as a chatbot can send natural language searches to the CAS search API. Upon receiving the request, CAS creates a vector representation of the natural language search and performs a query against the vector database that uses the search method that is specified in the API (semantic, keyword, or hybrid). In the NVIDIA Multimodal pipeline, the top 100 results are sent to a CAS aggregator component that uses additional keyword and optionally vendor reranking to produce the top number of requested results. The optimized results are passed to a reranking function, which performs an additional optimization of the results before returning them to the search API.

In a customer provided RAG client, the CAS search API can be used to retrieve relevant search results, which can then be combined with the original prompt or question and sent to a large language model (LLM) to generate a response for the end user. IBM provides a sample RAG client in the form of a chatbot that customers can use to explore CAS capabilities or as a foundation for developing their own chatbot or RAG-based solution.

Note: CAS leverages natural language processing (NLP) and AI technologies to extract semantic meaning from only unstructured text data.
Important: To know about the resource usage like, vCPU, memory, storage, and GPU, see System requirements.