Topic
  • 8 replies
  • Latest Post - ‏2012-02-15T17:00:50Z by TigerTrix
SystemAdmin
SystemAdmin
76 Posts

Pinned topic CM-CMIS: Duplicated (alternative?) document paths problem

‏2011-12-12T15:15:23Z |
I've created few nested folders and few documents using ObjectService's createFolder and createDocument methods, for example:

cmis:path = "/folder_a/folder_b/folder_c/some_document.pdf"
cmis:objectId = "$d!1011_A1001001A11L01B00006E72767v1"

When using NavigationService's getObjectParents, retrieved parent folder has different cmis:path property value:

cmis:path = "/$type/ClbFolder/folder_c{1012_A1001001A11L01B00006C83036v1}"
cmis:objectId = "$f!1012_A1001001A11L01B00006C83036v1"

So, retrieved document path is:

cmis:path = "/$type/ClbFolder/folder_c{1012_A1001001A11L01B00006C83036v1}/some_document.pdf"

But when using ObjectService's getObjectByPath methods, both paths retrieves the same document?!

"/$type/ClbFolder/folder_c{1012_A1001001A11L01B00006C83036v1}/some_document.pdf"
"/folder_a/folder_b/folder_c/some_document.pdf"

I would like to use getObjectParents for retrieving document relative path based on document id and some referent base folder. But such getObjectParents behaviour prevents me to use single service call to get the right absolute document path ("/folder_a/folder_b/folder_c/some_document.pdf"). Instead, I have to use getObjectParents recursive to obtain each parent folder id and path segment (which means multiple service calls).
Updated on 2012-02-15T17:00:50Z at 2012-02-15T17:00:50Z by TigerTrix
  • TigerTrix
    TigerTrix
    38 Posts

    Re: CM-CMIS: Duplicated (alternative?) document paths problem

    ‏2011-12-12T23:28:06Z  
    VARIED PATHS
    IBM CMIS will always give a valid, working path to any document or folder. However, it does not guarantee to return the path you expect when there are multiple possible paths to the same item. When you query or retrieve by ID, IBM CMIS does not know the context for which path you think it is in context to and can only guess which of several possible valid working paths should be returned. First, it attempts to maintain the expected path by a "handshake" between the application and service requests at least while browsing. The idea is that if you start browsing from root, and you call get children, then at least the children are in context to the parent path. The child path returned from a get descendents call includes the parent folder path that was used to specify which folder that you wanted the children of. For example, if you called get children on folder by specifying folder by path /a/b/c, then child paths will generate as /a/b/c/f1.txt and so on. But if you use get children on folder by specifying folder by ID 1009_A019239012v1, then it will generate a valid working path for the folder and generate children relative to it, such as /$type/MyFolderType/c{1009_A019239012v1}/f1.txt. However I believe the normal CMIS get children URL is generated by folder ID, resulting in the "handshake" only being relative to one level. In contrast this "handshake" usually works for the full path while browsing with IBM Content Manager Services for Lotus Quickr. Similarly, if you then retrieved the child f1.txt by ID and not by path, or if you queried for it, then it will generate a type path directly to f1.txt, which would look like /$type/MyDocType/f1{1011_A1230941238v1}.txt.

    ID-EMBEDDED "FAST" PATH
    Additionally, whether or not the path includes the embedded ID, which is called a "fast path" because it is faster to resolve, depends on whether enabled (which is on by default) and can be disabled to generate name-only paths, and whether in a relative path that can support name-only paths. There are two separate configuration settings for fastFolderAccessPath and fastDocumentAccessPath to control the folder levels within a path separately than just the document. They are also supported only for CMIS-optimized item types. But your biggest limitation will be that the "handshake" is not maintained by generated descendent URLs. Whenever you are in a type-based virtual path, it will not generate name-only because it is not sufficiently unique. See the fast path configuration settings for more info.

    GENERATING PATHS FROM ID
    Generating path from ID (without the application passing the expected parent path) is extremely expensive and strongly discouraged. With multiple possible parents due to multi-filing, you can attempt a brute force walk up the tree, handle cycles, and handle multiple parents, until you reach root, if there is even an accessible path between this document and root. There is no guarantee that you currently have access to all folders between you and root, or that it is even currently filed relative to root, however should eventually flow through the virtual $type hierarchy to root eventually if it finds an unfiled path. For this reason, a built in solution to generate paths in such a way is specifically not supported because it is not through to perform or scale. It is better to get the "handshake" to work and not rely on paths generated to be the original path the user saw. But you could do this, but I think you will be hindering the system with a lot more processing per request.

    Beware that a getParents / parent ID optimization exists such that the parent ID of documents found by ID or by query is a special value that defers resolving the actual parent ID until executed so that the cost to calculate is only incurred upon actually using the value. You will notice that it nearly just prefixes the child's ID with a special value that retrieving by this value later will know to get the parent of the child by the specified ID. However, this is not always true, such as if already known in a get descendents call where the folder is known. Also I believe this is configurable where you can decide to incur the performance cost to generate in all cases, although it is not recommended. But to browse up, you can work with the deferred parent ID value without a problem as long as you don't expect the parent ID to be a specific value before you retrieve by it.

    RETRIEVE BY PATH
    Retrieving by paths is a whole different story. You can retrieve by any of the possible multiple paths to a document or folder, whether virtually filed in the $type hierarchy or by any number of possible real paths that an item might be filed in, and by both name-only paths and any combination of fast paths with embedded IDs fully or partially.

    IBM CMIS supports retrieval by name-only path in v1.0.0.1 and earlier by an optimized multi-step query algorithm for CMIS-optimized item types (basically one consistent name attribute that you can configure, and recommends ICM$NAME from CM v8.4.3.1 optional for all, or CM v8.4.3.0 on hierarchical designated item types, or else clbContent.clbLabel extension on CMIS installed on older versions of CM, or any value you configure as the ideal name attribute. It will issue queries in an optimized way that can resolve a path without any embedded IDs. You should not be able to beat this implementation of resolving name-only paths for CM. This is the same solution that has been in use for IBM Content Manager Services for Lotus Quickr for a few years. For example, this is how CM works with Windows Explorer today and supports paths. There are also advanced configuration settings for fine tuning the retrieve by path algorithm.

    FUTURE IDEAS / YOUR REQUIREMENT
    However, all this still doesn't address your request to actually generate a path given only an ID in an optimal way. This is only to explain the behavior you are currently seeing.

    If you must get a normalized path relative to root, without the $type based path approach, without "handshake" solutions, and with pure name-only paths, there are prototype solutions in development that appear to completely solve the problem in a very optimal way. But this is nothing officially announced, no guarantee, no commitment, no timeline, and no sort of confirmation of any kind to convey in this forum at this time. However, there is hope. So what I would like to know from you is how important this is to you and if you cannot work around this limitation in how paths are currently handled.

    By the way, if you are an active IBM customer or partner, this is a requirement that you should mention to your IBM representative to submit as a real product requirement if you want to affect the possible timeline of such a solution.
  • SystemAdmin
    SystemAdmin
    76 Posts

    Re: CM-CMIS: Duplicated (alternative?) document paths problem

    ‏2011-12-14T07:25:16Z  
    • TigerTrix
    • ‏2011-12-12T23:28:06Z
    VARIED PATHS
    IBM CMIS will always give a valid, working path to any document or folder. However, it does not guarantee to return the path you expect when there are multiple possible paths to the same item. When you query or retrieve by ID, IBM CMIS does not know the context for which path you think it is in context to and can only guess which of several possible valid working paths should be returned. First, it attempts to maintain the expected path by a "handshake" between the application and service requests at least while browsing. The idea is that if you start browsing from root, and you call get children, then at least the children are in context to the parent path. The child path returned from a get descendents call includes the parent folder path that was used to specify which folder that you wanted the children of. For example, if you called get children on folder by specifying folder by path /a/b/c, then child paths will generate as /a/b/c/f1.txt and so on. But if you use get children on folder by specifying folder by ID 1009_A019239012v1, then it will generate a valid working path for the folder and generate children relative to it, such as /$type/MyFolderType/c{1009_A019239012v1}/f1.txt. However I believe the normal CMIS get children URL is generated by folder ID, resulting in the "handshake" only being relative to one level. In contrast this "handshake" usually works for the full path while browsing with IBM Content Manager Services for Lotus Quickr. Similarly, if you then retrieved the child f1.txt by ID and not by path, or if you queried for it, then it will generate a type path directly to f1.txt, which would look like /$type/MyDocType/f1{1011_A1230941238v1}.txt.

    ID-EMBEDDED "FAST" PATH
    Additionally, whether or not the path includes the embedded ID, which is called a "fast path" because it is faster to resolve, depends on whether enabled (which is on by default) and can be disabled to generate name-only paths, and whether in a relative path that can support name-only paths. There are two separate configuration settings for fastFolderAccessPath and fastDocumentAccessPath to control the folder levels within a path separately than just the document. They are also supported only for CMIS-optimized item types. But your biggest limitation will be that the "handshake" is not maintained by generated descendent URLs. Whenever you are in a type-based virtual path, it will not generate name-only because it is not sufficiently unique. See the fast path configuration settings for more info.

    GENERATING PATHS FROM ID
    Generating path from ID (without the application passing the expected parent path) is extremely expensive and strongly discouraged. With multiple possible parents due to multi-filing, you can attempt a brute force walk up the tree, handle cycles, and handle multiple parents, until you reach root, if there is even an accessible path between this document and root. There is no guarantee that you currently have access to all folders between you and root, or that it is even currently filed relative to root, however should eventually flow through the virtual $type hierarchy to root eventually if it finds an unfiled path. For this reason, a built in solution to generate paths in such a way is specifically not supported because it is not through to perform or scale. It is better to get the "handshake" to work and not rely on paths generated to be the original path the user saw. But you could do this, but I think you will be hindering the system with a lot more processing per request.

    Beware that a getParents / parent ID optimization exists such that the parent ID of documents found by ID or by query is a special value that defers resolving the actual parent ID until executed so that the cost to calculate is only incurred upon actually using the value. You will notice that it nearly just prefixes the child's ID with a special value that retrieving by this value later will know to get the parent of the child by the specified ID. However, this is not always true, such as if already known in a get descendents call where the folder is known. Also I believe this is configurable where you can decide to incur the performance cost to generate in all cases, although it is not recommended. But to browse up, you can work with the deferred parent ID value without a problem as long as you don't expect the parent ID to be a specific value before you retrieve by it.

    RETRIEVE BY PATH
    Retrieving by paths is a whole different story. You can retrieve by any of the possible multiple paths to a document or folder, whether virtually filed in the $type hierarchy or by any number of possible real paths that an item might be filed in, and by both name-only paths and any combination of fast paths with embedded IDs fully or partially.

    IBM CMIS supports retrieval by name-only path in v1.0.0.1 and earlier by an optimized multi-step query algorithm for CMIS-optimized item types (basically one consistent name attribute that you can configure, and recommends ICM$NAME from CM v8.4.3.1 optional for all, or CM v8.4.3.0 on hierarchical designated item types, or else clbContent.clbLabel extension on CMIS installed on older versions of CM, or any value you configure as the ideal name attribute. It will issue queries in an optimized way that can resolve a path without any embedded IDs. You should not be able to beat this implementation of resolving name-only paths for CM. This is the same solution that has been in use for IBM Content Manager Services for Lotus Quickr for a few years. For example, this is how CM works with Windows Explorer today and supports paths. There are also advanced configuration settings for fine tuning the retrieve by path algorithm.

    FUTURE IDEAS / YOUR REQUIREMENT
    However, all this still doesn't address your request to actually generate a path given only an ID in an optimal way. This is only to explain the behavior you are currently seeing.

    If you must get a normalized path relative to root, without the $type based path approach, without "handshake" solutions, and with pure name-only paths, there are prototype solutions in development that appear to completely solve the problem in a very optimal way. But this is nothing officially announced, no guarantee, no commitment, no timeline, and no sort of confirmation of any kind to convey in this forum at this time. However, there is hope. So what I would like to know from you is how important this is to you and if you cannot work around this limitation in how paths are currently handled.

    By the way, if you are an active IBM customer or partner, this is a requirement that you should mention to your IBM representative to submit as a real product requirement if you want to affect the possible timeline of such a solution.
    Thank you for your extensive answer. I will explain our business scenario. My task is to integrate our application with Content Management Systems using CMIS. Our clients are using IBM CM, EMC Documentum, or Microsoft SharePoint. We are moving documents from our application to CMS. Documents are stored into folders based on some rules. We want to keep (to persist) durable reference in our application to those documents stored in CMS. Obvious solution is to keep document relative path from some configurable folder. Keeping object ids, generated by CMIS, would be more performant, but it is not acceptable due to maintenance cost. (There is requirement when moving from test to production, to have document references that are immutable. Another requirement is ability to switch between CMS without updating document references.) I'm prepared to pay higher price on performance and to have system that can be more easily maintained.

    Other CMS (over CMIS) I have tested (Documentum, SharePoint) use “normalized” paths. I would like to use the same approach, independent of CMS vendor.

    So, the main problem is, when creating document, ObjectService’s createDocument returns object id, but I need to keep document relative path. Number of create/modify transactions won’t be high, so there is buffer for calculating relative path.

    As you suggested, there are two solutions. 1) Use NavigationService’s getObjectParents for each path segment. 2) Use prototype CMIS producer. What will you prefer?
  • TigerTrix
    TigerTrix
    38 Posts

    Re: CM-CMIS: Duplicated (alternative?) document paths problem

    ‏2011-12-14T23:19:37Z  
    Thank you for your extensive answer. I will explain our business scenario. My task is to integrate our application with Content Management Systems using CMIS. Our clients are using IBM CM, EMC Documentum, or Microsoft SharePoint. We are moving documents from our application to CMS. Documents are stored into folders based on some rules. We want to keep (to persist) durable reference in our application to those documents stored in CMS. Obvious solution is to keep document relative path from some configurable folder. Keeping object ids, generated by CMIS, would be more performant, but it is not acceptable due to maintenance cost. (There is requirement when moving from test to production, to have document references that are immutable. Another requirement is ability to switch between CMS without updating document references.) I'm prepared to pay higher price on performance and to have system that can be more easily maintained.

    Other CMS (over CMIS) I have tested (Documentum, SharePoint) use “normalized” paths. I would like to use the same approach, independent of CMS vendor.

    So, the main problem is, when creating document, ObjectService’s createDocument returns object id, but I need to keep document relative path. Number of create/modify transactions won’t be high, so there is buffer for calculating relative path.

    As you suggested, there are two solutions. 1) Use NavigationService’s getObjectParents for each path segment. 2) Use prototype CMIS producer. What will you prefer?
    Aside from submitting feature requirements for future releases, you will have to do the get parents and any complexity that it could cause with cycles, multi-filing, unfiled, user permissions, etc. For a built in solution, you will have to make a request to your IBM representative as a requirement. Or else leave this as a suggestion for consideration for the development team to consider by feedback means of this forum. The prototype solution I referred to is not available to the public. It would be a candidate for a future release or fix pack if there was demand for it, if it proves successful, and other factors come into consideration. So your feedback here is valuable. To take a more active role in formally requesting solutions to your problems to be built in, you would need to use the formal requirements process.

    But informally, this serves as feedback and will have an influence as this is a feedback mechanism.
  • SystemAdmin
    SystemAdmin
    76 Posts

    Re: CM-CMIS: Duplicated (alternative?) document paths problem

    ‏2012-02-11T07:46:33Z  
    • TigerTrix
    • ‏2011-12-14T23:19:37Z
    Aside from submitting feature requirements for future releases, you will have to do the get parents and any complexity that it could cause with cycles, multi-filing, unfiled, user permissions, etc. For a built in solution, you will have to make a request to your IBM representative as a requirement. Or else leave this as a suggestion for consideration for the development team to consider by feedback means of this forum. The prototype solution I referred to is not available to the public. It would be a candidate for a future release or fix pack if there was demand for it, if it proves successful, and other factors come into consideration. So your feedback here is valuable. To take a more active role in formally requesting solutions to your problems to be built in, you would need to use the formal requirements process.

    But informally, this serves as feedback and will have an influence as this is a feedback mechanism.
    I've disabled fast paths and now there is no composite folder problem. In configuration file:

    c:\Program Files\IBM\WebSphere\AppServer\profiles\service_name\installedApps\EXP-IBMCMSNode01Cell\cmcmis.ear\cmcmis.war\WEB-INF\classes\cmpathservice.properties

    I changed this two keys:

    fastFolderAccessPaths = false
    fastDocumentAccessPaths = false

    For now, there is no performance loss.
  • SystemAdmin
    SystemAdmin
    76 Posts

    Re: CM-CMIS: Duplicated (alternative?) document paths problem

    ‏2012-02-11T15:52:24Z  
    I've disabled fast paths and now there is no composite folder problem. In configuration file:

    c:\Program Files\IBM\WebSphere\AppServer\profiles\service_name\installedApps\EXP-IBMCMSNode01Cell\cmcmis.ear\cmcmis.war\WEB-INF\classes\cmpathservice.properties

    I changed this two keys:

    fastFolderAccessPaths = false
    fastDocumentAccessPaths = false

    For now, there is no performance loss.
    It works on development environment, but doesn't work on production environment.

    fastFolderAccessPaths = false
    fastFolderTreeCopyAccessPaths = false
    fastDocumentAccessPaths = false
    findDocumentWithoutFastAccessPath = true
    findFolderWithoutFastAccessPath = true
    findWithoutFastAccessPath_Algorithm = DOWN #same with UP
    findWithoutFastAccessPath_DocTypes = EXTENDED
    findWithoutFastAccessPath_MaxLevelsPerQuery = 2
    findWithoutFastAccessPath_SeparateWildcardStep = true
    findWithoutFastAccessPath_StepTimeout_s = 120
    findWithoutFastAccessPath_UnscopedAssumesFolder = false
  • TigerTrix
    TigerTrix
    38 Posts

    Re: CM-CMIS: Duplicated (alternative?) document paths problem

    ‏2012-02-13T19:26:45Z  
    It works on development environment, but doesn't work on production environment.

    fastFolderAccessPaths = false
    fastFolderTreeCopyAccessPaths = false
    fastDocumentAccessPaths = false
    findDocumentWithoutFastAccessPath = true
    findFolderWithoutFastAccessPath = true
    findWithoutFastAccessPath_Algorithm = DOWN #same with UP
    findWithoutFastAccessPath_DocTypes = EXTENDED
    findWithoutFastAccessPath_MaxLevelsPerQuery = 2
    findWithoutFastAccessPath_SeparateWildcardStep = true
    findWithoutFastAccessPath_StepTimeout_s = 120
    findWithoutFastAccessPath_UnscopedAssumesFolder = false
    1. Fast paths can only be disabled for item types that have the configurable ideal case name attribute. This is the attribute configured as the sortByIdealCaseNameAttr setting, which is usually ICM$NAME or clbContent.clbLabel depending on whether you installed first with CM 8.4.3.1 or an earlier CM version.

    2. Name-only paths can only be resolved for item types that have the configurable ideal case name attribute (same criteria that causes fast paths or not).

    So your production system probably has non-ideal item types compared to your development system. For non-ideal item types to work perfectly without a fast path, a solution is not yet available in a current release. For now, if fast paths are a problem, you will need to use optimized item types that use the same configured ideal name attribute, and ensure the configuration settings know of it. But the paths can still vary depending on how accessed and will not be fully regenerated other than to ensure a valid working path.
  • SystemAdmin
    SystemAdmin
    76 Posts

    Re: CM-CMIS: Duplicated (alternative?) document paths problem

    ‏2012-02-15T09:07:59Z  
    • TigerTrix
    • ‏2012-02-13T19:26:45Z
    1. Fast paths can only be disabled for item types that have the configurable ideal case name attribute. This is the attribute configured as the sortByIdealCaseNameAttr setting, which is usually ICM$NAME or clbContent.clbLabel depending on whether you installed first with CM 8.4.3.1 or an earlier CM version.

    2. Name-only paths can only be resolved for item types that have the configurable ideal case name attribute (same criteria that causes fast paths or not).

    So your production system probably has non-ideal item types compared to your development system. For non-ideal item types to work perfectly without a fast path, a solution is not yet available in a current release. For now, if fast paths are a problem, you will need to use optimized item types that use the same configured ideal name attribute, and ensure the configuration settings know of it. But the paths can still vary depending on how accessed and will not be fully regenerated other than to ensure a valid working path.
    Both systems have the same document type configuration. New document type have the clbContent and clbVersion attribute groups. So clbContent.clbTitle attribute is present. New document type is optimized by DocumentEnableTool.

    Both systems have the same configuration option:
    sortByNameIdealAttribute = clbContent.clbLabel

    How to make item type that have the configurable ideal case name attribute?
  • TigerTrix
    TigerTrix
    38 Posts

    Re: CM-CMIS: Duplicated (alternative?) document paths problem

    ‏2012-02-15T17:00:50Z  
    Both systems have the same document type configuration. New document type have the clbContent and clbVersion attribute groups. So clbContent.clbTitle attribute is present. New document type is optimized by DocumentEnableTool.

    Both systems have the same configuration option:
    sortByNameIdealAttribute = clbContent.clbLabel

    How to make item type that have the configurable ideal case name attribute?
    #1. It still sounds like you might have a difference between the two systems based on your previous comment that you might not realize. This is all about whether or not both system have the same name attribute, which is either ICM$NAME or clbContent.clbLabel. Then it matters which you select in configuration settings as the ideal name attribute. You only confirmed that both have clbContent.clbTitle, which is a completely different attribute from clbContent.clbLabel. Please go into the CM Sys Admin client and compare the item types of each. Double check if they both actually have the "clbLabel" in the clbContent attr group on this item type or not.

    Fresh installations with CM V8.4.3.1 or later will omit the "clbLabel" attribute and replace it with an independent ICM$NAME attribute. Then it must be configured according in the configuration settings.

    #2. Double check that your fastFolderAccessPaths and fastDocumetAccessPaths are set the same in both systems. When returning paths, whether a fast path is returned is based on (1) this setting and (2) whether or not it actually found the configured ideal name attribute on the item type or not.

    #3. Check to see that both are using folders created through the same, which is the folderTypeGlobalDefault. Resolving paths by name-only for folders works folders of the global default type.

    #4. Let's say you find out they both are clbLabel based and not ICM$NAME. Another difference could be whether you are talking about existing data created by another application or through the CMIS mid-tier? When you add clbContent group or ICM$NAME to an existing item type, existing data is not migrated, but is compatible in a compatibility mode. Only data that actually fills out the name value in one of the ideal name attribute locations can be found by name-only path values versus a fast path. What it does is build a set of queries to resolve the path based on (1) the configured ideal case folder type (probably ClbFolder unless you changed it), (2) all document types that have the configured ideal case name attribute, and (3) the actual values with data stored in the ideal case name attribute. The actual set of queries and algorithm is more complicated, but these are the main factors. If the data is there, it should be found by the queries.

    #5. Are the documents checked in at least once for the first time? When a document is created only as a PWC/draft, it exists only as a draft. Only the creator of that draft can see it. Others can see it once checked in.