• 1 reply
  • Latest Post - ‏2013-05-30T20:09:03Z by Matt Taylor
Matt Taylor
Matt Taylor
5 Posts

Pinned topic How to represent Folder-level metadata

‏2013-05-30T17:19:42Z |

Content Analytics lets us attach metadata facets to a document, or to text fragments within a document.  Is there any way to also represent metadata fields that apply to groups of documents within a collection?  A subfolder containing multiple documents would be one such group, but there could be other file groupings where shared metadata makes sense.

For example, say that a collection consists of letters from clients.  Every letter is assigned a Client Name facet, derived from its own text.  Some, but not all, letters also have a Client Address facet.  Is it possible for the Client Address facet to be shared among all the letters with the same Client Name? 

I am thinking of perhaps a custom annotator, but am unsure if that can only see a single document, not analyze relationships among multiple documents in the collection.  Any advice is appreciated.

  • Matt Taylor
    Matt Taylor
    5 Posts

    Re: How to represent Folder-level metadata


    What I'm after may be considered a custom UIMA Collection Processing Engine (CPE) as defined at

    A Collection Processing Engine (CPE) is a component that the UIMA developer creates by specifying a CPE descriptor. A CPE descriptor points to a series of UIMA components, including a Collection Reader, CAS Initializer, Analysis Engine(s), and CAS Consumers. These components organized in a particular flow define a collection analysis job that acquires documents from a source collection, initializes CASs with document content, performs document analysis, and then produces collection level results (for example, search engine index, database, and so on). The CPM is the execution engine for a CPE.

    Does IBM Content Analytics allow user-defined CPE's to be integrated with its collections?