Architecture
The objective of the Content-Aware Storage (CAS) architecture is to facilitate smooth interactions between LLMs and extensive quantities of unstructured data, amplifying insight derivation and suggestion functionalities.

The IBM Fusion CAS service uses IBM Fusion Global Data Platform service to remote mount a Storage Scale cluster. This cluster can be Storage Scale software based, or Storage Scale Server 3500 or 6000 based. PDF documents stored on the remote IBM Storage Scale cluster are automatically processed by CAS for by the NVIDIA NeMo Retriever Extraction microservices which extract text, tables, and charts. The parsed content and metadata is normalized and stored along with vector embeddings in the vector database managed by CAS.
IBM Storage Scale watch folders are leveraged to detect changed data in the remote IBM Storage Scale filesystem, which triggers ingest processing by CAS of just the changed content.
Active File Management (AFM) can optionally be used to perform ingest processing of PDF documents stored in external S3 data sources by CAS, facilitating their use in RAG applications. CAS performs a scan of the AFM-S3 fileset automatically to determine the data that needs to be ingested.
The CAS service provides a search API that aggregates semantic and keyword search to return the most accurate results. NVIDIA's reranking is an optional search component that can be added to provide more accuracy.
Customer applications such as a chat bot can send natural language searches to the CAS search API. Upon receiving the request, CAS creates a vector representation of the natural language search and performs a query against the vector database that uses the search method that is specified in the API (semantic, keyword, or hybrid). The top 100 results are sent to a CAS aggregator component that uses additional keyword and optionally vendor reranking to produce the top number of requested results. The optimized results are passed to a reranking function, which performs an additional optimization of the results before returning them to the search API.
In a customer provided RAG chat bot, the search results provided by the CAS search API can be combined with the original prompt or question and sent to an LLM to generate the answer provided to the end user.