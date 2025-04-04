A Business Intelligence platform typically ingests structured file or streamed data, passing through malware detection, a standardisation or conform process which corrects the data if needed and may change the data so that it is e.g. the correct units to store in the database.

While the Business Intelligence data pipeline is retained, other types of data typically can be ingested, including documents, images and timeseries data including video and audio. However, existing ingest pipelines cannot be easily re-worked to extract all the elements from these data types. In addition, traditional stores cannot efficiently hold content, metadata, and relationships between data, while also supporting search for extracted elements in multiple dimensions. Equally, Business Intelligence structured data is well suited to typical relational databases. This requires different ingestion mechanisms for the additional data types supporting content extraction and storage approaches as shown below.

Documents, video, audio files are containers holding rich content as well as metadata and relationships that can be detected or derived. What this means is that we need to build pipelines that can process these data types and extract value which can be used in a meaningful way for by the multi-agent LLMs. For the different data types, we need detect and derive content. Detection means understanding the content, and extraction of key elements e.g. text, shapes/objects, values etc, deriving is where we further process the detected element or combination of elements to gain insight e.g. sentiment, object recognition etc. Additionally, we need to extract metadata; Metadata has many roles; it may describe the context of the data type e.g. where it came from, it’s security, genre, format etc., it can also describe e.g. technical aspects of the data type, bit pattern, position, sample rate etc. as well as other elements of the data set, or created specifically to define, control further data or elements of the data it relates to.