1 reply Latest Post - ‏2012-03-29T16:37:19Z by TigerTrix
9 Posts

Pinned topic Item Type recommended for performance? Document or Resource Item?

‏2012-03-29T14:58:00Z |
I continue with tests on CMIS CM and EE (8.4.3). Reading the documentation for both tools I got a question regarding the recommended base Item Type:

The documentation of CM EE says that the Item Type "Document Model" is the more appropriate "because of the performance enhancements Explicitly built into IBM Content Manager Specifically for the document model". Link:
On the other hand, the documentation of CMIS indicates for more performance "item types with resource item type classification". Link:
Since internally boths are CM EE Item Types, which is better for performance? Document or Resource Item?
Updated on 2012-03-29T16:37:19Z at 2012-03-29T16:37:19Z by TigerTrix
  • TigerTrix
    38 Posts

    Re: Item Type recommended for performance? Document or Resource Item?

    ‏2012-03-29T16:37:19Z  in response to HenriqueLira
    "Resource" winds hands down over "Document" in performance. I prefer to alter the naming of these in my thinking:
    - "resource" = normal document optimized for a single binary. Think of a Word document you drag & drop into a folder, it is just a document with one binary, the word doc itself (one binary), and some metadata like filename, size, and maybe some custom attributes like "customer name".
    - "document" = compound document model supporting multiple binary attachments. Think of something that would have multiple separate binary entities, like a page with multiple attachments, or a document with annotations, notes, history, and all sorts of stuff that you want to track under one item instance, under the same ACL rights, and viewed with a special viewer that knows how to display them all to the user. A custom viewer is needed to understand the composition of a compound document and know how to display and work with it.
    - "item" = metadata-only document. Think of a folder or saved search.

    There is overhead to adding in support for multiple attachments. It does so by separating out the binary content into separate entities called "parts" (or content elements). This level of abstraction has a measurable cost, and the expertise of the API usage can do better or worse depending on how well it is handled. CM-CMIS of course does its very best to use it as optimally as possible on top of CM. But even in the best case, there is a cost for the level of abstraction. The other way to look at it is a "resource" is like a compound document, but optimized for a single binary content element by merging that one up in the root of the document as one instance. In fact, a "resource" is virtually identical to one individual custom part, just without the document instance. A common mistake is to just use "document" (meaning "compound document") when you are not actually using any compound aspect and were only storing a normal document. To store a normal document as a compound document, it takes 2 components, a root document with metadata, and a content element part, and a relationship stored between them. But the same stored as a "resource" takes only 1 component, just the combined root document metadata + binary. The level of separation of some important metadata, such as File Size, MIME type, and original file name (if you aren't using an optional universal name attribute on the root of the document). To obtain this information when opening a folder to list the file size of all the results has a 3X-22X overhead depending on the number of content element parts. More like 3X if you are just overusing them for normal single-binary documents.

    Optimizing - CM API Options
    There is a new CM API option (with some draw backs that usually aren't a problem) (in CM I think) that can optimize the metdata retrieval for things like folder browsing when you need original file name, file size, and MIME type just to show folder or search results. The new option brings the overhead down from 3X-22X to only 2X-3X, compared to a "resource", and more like 2X if you are just overusing compound documents for normal documents. So it gets significantly better.

    CMIS & Compound Documents
    CMIS as a standard doesn't support compound documents. The standard does (optionally) support "renditions", which are things like thumbnails, but not compound documents. CM-CMIS supports compound document types if you have them. But it has to figure you intend to access them for only one of their binary parts, or are just overusing compound documents for single binary, or that you are using compound documents where the really is only one "base content" and the rest of the compound parts are auxiliary aspects like "annotations", "notelogs", or things that aren't essential to understand elsewhere. CM-CMIS will use a primary part selection algorithm and figure out what is the best possible single part to return if multiple exist, and portray that as the single binary visible through CMIS. CM-CMIS is implemented with expert understanding of the CM API.

    Optimizing - CMIS Options
    CM-CMIS understands the performance overhead of compound documents, and offers some config options to let you choose to accept some limitations for even better performance, and it does so by default. You can often work with only the root document instances and not retrieve the content element part information for faster performance, especially on multi-item results like folder children & search. But this results in file size 0 and MIME type NULL in some cases. It only does this when there is a file name in the root component of the document, otherwise it is forced to always obtain the parts. But there is a config option that you can force it to (almost) always retrieve file size & MIME type. There are still a few cases that it doesn't. In normal application usage, you shouldn't notice the cases that it never retrieves it.

    CM-CMIS doesn't support the CM API option optimization above yet because it was still working through the case for when it doesn't work and would need configuration options and error handling. But this is likely to come in any subsequent release (no guarantees in this informal forum). Once this is supported, I would recommend using the option, and not fearing compound documents. But I still wouldn't use them unless you had any functional use for the compound aspect otherwise you are just wasting performance that could be better had only you picked "resource".

    Optimizing - CM Server
    As you mentioned, CM EE does refer to the compound document model being optimized. This is true, but not in the comparison to "resource" that you are thinking. There are functional needs for compound documents. Comparing to other possible compound document implementations, such as if you were to model your own in CM using other kinds of relationships like links & reference attributes, the built-in model is more optimal. But again, it is compared to modeling your own compound document model. Note that CM-CMIS doesn't use it for its draft/private working copy support, or any of its example best practices types.

    Functional Decision - Advanced ECM Applications
    The real decision comes down to what you are planning to store? -- normal documents, like uploading or dragging & dropping in PDF files, TXT files, Word docs, etc, and using them pretty much just like the file as it was on your file system, except with some extra metadata? Or do you have an application & viewer that will really use compound document relationships? CM8's advanced ECM client applications like eClient & WEBi support an advanced viewer that understands compound documents and can work with multiple content elements. Still, some use such applications to upload normal documents. You should most definitely match the item type for what is most optimal for what you are storing and working with. For interoperability with the same item type for use in both compound document uses and non-compound uses, you have to use compound, otherwise you can just use 2 item types and users pick the right one for the right situation.

    For example, you should (almost) never make a folder type a compound document. A folder should usually just be metadata-only, or "Item" classification. Making a folder a compound document will often make folder retrieval check for content element parts that would almost never exist for a folder, but technically can because there are a few possible uses for it. Btw, CM-CMIS will optimize folders from compound documents by never checking for parts for them.

    Limited Data Model Tolerance in Some Applications
    Some ECM applications do not support the full CM data model for all classifications, metadata-only (item), normal document (resource), and compound document (document). For example, eClient supports only the compound document model. But WEBi, CM-CMIS, CM-Quickr, and Document Manager all support all 3. If you have applications that support only one, you might need to pick the type that works in all cases, or use different item types in each.

    Btw, there are some implications to creating extra item types because some features perform proportionally to the number of item types that exist in the system. For example, a system with 64 item types will take longer than an equal system with only 5. So there is a balance.

    Both Perform, Just One Can Optimize More
    Regardless, the CM server does consider the CM compound document model to perform and scale to expectations and high standards of IBM ECM. I don't mean to scare you from using compound documents. The whole point is, you could be doing better, even though both perform & scale to the standards of IBM ECM, if you pick the one that fits your situation.