A CMIS API library for Python, Part 2: Build real world ECM tools with Python and cmislib

Create a sample application

In Part 2 of this series on CMIS and Python, build an xcopy-like data population and migration tool using the Python cmislib library. The tool not only xcopies local file systems to any CMIS repository but is also aware of JPG Exif data and preserves it during the copy if possible. Walk through the source code and learn how to use the tool on the command line. Also, you can download the fully functional source.


Jay A. Brown (jay.brown@us.ibm.com), Senior Engineer, IBM

Photo of Jay BrownJay Brown is a senior engineer in the Software Group at IBM. He has worked in software development for twenty-one years for Shearson Lehman, General Electric, and FileNet, the last eleven of those building and designing ECM systems for FileNet and IBM. Among his contributions to IBM are the design and construction of the code generators for both of the P8 4.x Content Engine (C.E.) APIs (Java and .Net), two years on the P8 Architecture team, and the design and construction of the CMIS servers. Mr. Brown is currently the development lead for the CMIS servers.

25 March 2009

Also available in Chinese Russian

Combine Python and CMIS

Other articles in this series

Python and CMIS work better together. For an overview of the Oasis Content Management Interoperability Services (CMIS) specification and cmislib, refer to Part 1 of this series (see the link in Resources).


This is the second article in this series and has three main sections:

Frequently used acronyms

  • API: Application program interface
  • ECM: Enterprise content management
  • IDE: Integrated development environment
  • OASIS: Organization for the Advancement of Structured Information Standards
  • PDF: Portable Document Format
  • REST: Representational State Transfer
  • URL: Uniform Resource Locator
  • XML: Extensible Markup Language
  1. Python and CMIS—This is an introduction to and discussion of why Python is a natural fit when you write CMIS-related tools.
  2. Code walkthrough—Review the code with an explanation of how it all fits together so you can easily extend it for other types of metadata and sources.
  3. Running the tool—Explore runtime aspects of using the tool as well as setting up the dependencies. If you are just interested in downloading the tool and using it without an explanation of why it came to be or how it works, then jump to the section Running the tool.

Learning a new language by tool building

A month before I started this article I was looking for a small project that I could use to teach myself Python. For me, when it comes to learning a new language, I can read a textbook from cover to cover, but a week later I've forgotten it all. I never get a real world feel for how the language and its associated tools work for me. If I don't take a language and do something useful with it, then the concepts just don't solidify in my memory. It also happened that I recently worked with Jeff Potts to iron out some interoperability issues between his new CMIS Python library (cmislib) and the IBM CMIS technology preview servers. This got me thinking about tooling for long-lived systems.

Tools for long-lived systems

First let me clarify what I mean by long-lived systems with an example that I've been close to for many years. Once deployed, ECM systems have the potential to be running in a company or department for a very long time. They can be big, complex, and expensive to replace. All of these are good reasons to leave them alone when they are working fine as they are. Because of this, these systems can often survive multiple consolidations and acquisitions over time. So when I talk about CMIS repositories, by definition, I also often talk about these systems. Long-lived systems accumulate some very mature user- and administrator-created tools. I think that Python scripts with cmislib have the potential to live on in this way. I hope that once you have seen a powerful yet simple example of what these two technologies can do together, you will agree.

Why standard interpreted languages

When it comes to tooling for these systems, there is something comforting about running interpreted scripts. I guess for me it’s a sense of control. If you have a tool written in Java™ or C++ languages and you need to fix or change something, you can't always easily do it, even if you have the source. How many times have you tried to re-compile something you yourself wrote a couple years back, only to discover that to build it you needed some special build environment, libraries, or settings that you've long since lost track of? Sure, most developers unquestioningly back up their source code, but backing up a build environment is a lot more difficult and customarily gets done only in more strictly maintained environments. This is generally not the domain of tools, in spite of their importance. With scripts, the build environment can consist of nothing more than a common text editor and the off-the-shelf runtime. (Note: I don't recommend doing significant Python programming without a good Python IDE like Eclipse with Pydev.)

That feeling of control I mentioned is even more profound when I know I can take my script and move it to another type of operating system later and it will behave the same way. If I’m confident I can do this, then I'm a lot more likely to spend time writing a better tool, since I know I won't have to write it again later for another platform. As long as I can get my hands on an interpreter for a given platform, I'm safe. And as long as the language has lots of popular support like Python does, availability is not a problem. These days I jump around a lot between Microsoft® Windows® and Linux® when I do development, and I don't even know what I will be using two years from now. It's just something to think about, and the reason why I have recently come to a solid appreciation of Python.

Choosing the problem to solve

One of the tools that I needed as I worked to harden a CMIS server within IBM was a good repository population tool. Of course, data in a CMIS repository is more than just the document payload. It is also a potentially rich set of metadata related to that document. One common type of document with metadata that developers all deal with regularly is JPG images. What I like about JPG files for testing is that they usually have a rich set of interesting metadata present in their headers, which means I don't have to write extra code to make up fabricated values. This EXchangeable Image File Format (Exif) data will be familiar to those of you who dabble in digital photography. If you are not already familiar with it, I suggest you start with the Wikipedia article on the subject (see Resources for a link).

Tool requirements

The tool you are going to create needs do the following:

  1. Copy the hierarchy of files in a local file system to any specified CMIS compliant repository and preserve the filename of the file as the cmis:name of the new CMIS document.
  2. If the tool encounters JPG type files during the xcopy, make an effort to also duplicate all the Exif data associated with the image into the repository assuming the repository contains compatible property definitions. This is the really interesting part of the tool. The xcopy functionality alone, although pretty useful in itself, was a little too boring to warrant its own article (although you can use this tool as just a plain old file system to CMIS xcopy if that is the only reason you need it).

Walk through the code

Now you are ready to look at the parameters and code for the tool.

Define inputs

Let's now define how this tool will be used. First, I model this after the venerable xcopy so this will be a command line tool. The parameters will be:

  • -s Source directory for the copy.
  • -f File filter (for example, *.doc, *.jpg, *.*, and so on).
  • -t target path The full path of the target directory for the copy operation (for example: /pictures/Fiji_August_2010/ ). The tool will assume the target path exists and will create all child folders automatically.
  • serviceURL The full URL of the XML service document for the target CMIS repository (for example: http://localhost:9080/cmis/service ).
  • targetClassName Optional class type to specify the subclass of cmis:document that you will create for the new documents.

    For example, the content management system for a photographer might has a class named CmisJpg. That CmisJpg class contains property definitions for some of the common Exif values that she might query during a search through her catalog of images. If not supplied, the tool will create all documents as type cmis:document.

  • debug Debug mode (optional).

    When equal to true, debug mode only attempts to recreate the target directory structure, but won't copy any documents.

    If omitted, this defaults to false (copy all data).

The code

I will now step through some major sections of the code describing how they work as I go. Note that, to simplify the article, I won't cover all of the code in this level of detail, but the parts that I skip should be self explanatory. To get the entire file (with comments), see Download. Here is a high level list of the items I will touch on.

  • Read the six runtime parameters into the tool.
  • Initialize the cmislib library and get a repository object that you will use as the root of all our communications with the CMIS repository.
  • Verify that the target folder and target class definition are valid and exist on the target library.
  • Review basic xcopy logic.
  • Read in Exif header data from JPG files.
  • Use cmislib to create a document with metadata.
  • Fit the Exif data into the properties of the target class dynamically based on types.

Step 1: Parsing the parameters

First you need to get the six parameters (see Define inputs) into the tool at runtime. Here I made a decision to split them into two categories. The first are the values that I expect to pass in on the command line. (I don’t want to have to specify all six on the command line since half of them won’t change all that much for a given repository.) Since I'm modeling after xcopy, I will only accept the first three (source, target, and filter). What about the others? For these I will resort to a text .cfg file since these will be static for a given repository anyway. Place the configuration file in the same directory as the script and name it cmisxcopy.cfg.

Listing 1. Sample cmisxcopy.cfg file
# service url for the repository that you will be copying to

# the cmis:objectTypeId of the class that you wish to create


The standard Python library ConfigParser will work nicely for reading this configuration file data as in Listing 2:

Listing 2. Using the ConfigParser to retrieve the configuration file values
import ConfigParser

# config file related constants
configFileName = 'cmisxcopy.cfg'
cmisConfigSectionName = 'cmis_repository'

# read in the config values
config = ConfigParser.RawConfigParser()
    UrlCmisService = config.get(cmisConfigSectionName, "serviceURL")
    targetClassName = config.get(cmisConfigSectionName, "targetClassName")
    user_id = config.get(cmisConfigSectionName, "user_id")
    password = config.get(cmisConfigSectionName, "password")
    debugMode = config.get(cmisConfigSectionName, "debug")
    print "There was a problem finding the config file:" + configFileName + \

    " or one of the settings in the [" + cmisConfigSectionName + "] section ."

An even simpler way that you can do this is to have an additional config.py file with nothing but these three constants in it. The main script can then just import the config.py and use those variables directly. Next up I will talk about the command line parsing.

Listing 3 shows how you used optparse from the python library for command line parsing of the other three parameters. Set the usage string to show a hint for when invalid parameters are submitted. Add the –s, -t, and –f parms for source, target, and filter respectively with the add_option() method. Finally, do a parse_args() to serialize the values into your options object.

Listing 3. Using optparse to collect command line parameters
from optparse import OptionParser

usage = "usage: %prog -s sourcePathToCopy -t targetPathOnRepository 
    -f fileFilter(default=*.*)"
parser = OptionParser(usage=usage)

## get the values for source and target from the command line
parser.add_option("-s", "--source", action="store", type="string", dest="source", 
    help="Top level of local source directory tree to copy")
parser.add_option("-t", "--target", action="store", type="string", dest="target", 
    help="path to (existing) target CMIS folder. All children will be created 
        during copy.")
parser.add_option("-f", "--filter", action="store", type="string", dest="filter", 
    default="*.*", help="File filter. e.g. *.jpg or *.* ")

(options, args) = parser.parse_args()
startingSourceFolderForCopy = options.source
targetCmisFolderStartingPath = options.target

Step 2: Initialize your CMIS connection with cmislib

Listing 4 finally starts to do some real CMIS work. First you have to import the cmislib module. To download the latest version of this library, see the cmislib link in Resources. First, initialize the client object with the values you retrieved earlier (in Listings 2 and 3) and then get the defaultRepository object as repo. Next, try to retrieve the target folder using the getObjectByPath(path) call. (Note that cmislib makes getting this object as easy as if you were getting a local folder object from the standard python library.) If for some reason this fails (like the specified folder does not exist), the task will fail and print an appropriate message. You then do a similar sanity check for the target type definition with the getTypeDefinition() call.

Once you have these two valid cmislib objects in hand, you know that all the info you have pertaining to the target system is good so you can proceed with processing. Note the line where I initialize the folderCacheDict dictionary object. Later on, if I need this folder object again, I can get it from this cache rather than making another round trip to get it. Note the caching is not really required for the particular traversal algorithm you are using, but I leave it in to show how to do this if it is needed as you expand this tool in the future.

Listing 4. Initialize the CmisClient and retrieve the target folder and target class objects
from cmislib.model import CmisClient

# initialize the client object based on the passed in values
client = CmisClient(UrlCmisService, user_id, password)
repo = client.defaultRepository

# test to see if the target folder is valid
targetCmisLibFolder = None
    targetCmisLibFolder = repo.getObjectByPath(targetCmisFolderStartingPath)
    # terminate if we can't get a folder object
    print "The target folder specified can not be found:" + targetCmisFolderStartingPath

# initialize the folder cache with  the starting folder
folderCacheDict = {targetCmisFolderStartingPath : targetCmisLibFolder} 

# test to see if the target class type is valid
targetTypeDef = None
    targetTypeDef = repo.getTypeDefinition(targetClassName)
    # terminate if we can't get the target class type definition object
    print "The target class type specified can not be found:" + targetClassName

Step 3: Implement the basic xcopy logic

In this step, you will traverse the source file system tree. You will look for files that need to be copied and create any child subdirectories necessary in the target so the hierarchies will match after the copy. To do the traversal of the source directory structure you use the walk() method in the Python os module. This returns a 3-tuple of (dirname, dirs, files) for each directory in the source tree which you will use to feed your processDirectory method (see the full listing in Download). The function processDirectory() then proceeds to create the target directory (if it is not already there) and hands off to our copyFilesToCmis() method to actually copy the files to the newly created target folder. This method will iterate through each of the files it is handed, filtering out the ones not requested and getting the Exif data for those that are of type .jpg. You will dive deeper into copyFilesToCmis() later when I discuss metadata.

Step 4: Reading Exif data

For every file that you encounter that is of type JPG, you need to extract all of the Exif values so they can be preserved in the target object. So when it comes to reading JPG header (Exif) data, there are a lot of ways to solve this problem. I was not looking to write my own in this case since there are several libraries already out there. I choose exif-py since it returns the tags in a very common way: as a dictionary of key value pairs where all of the values are strings. (See Resources since you will need to download exif-py for your script to run.) I figured if you wanted to substitute something more exotic (or customized) here it will be very easy since I would expect most libraries to use a dictionary to represent the properties collection, and, if they didn't, it would be trivial to adapt them to do so. The actual code for this turned out to be embarrassingly simple. See for yourself in Listing 5.

Listing 5. Read the Exif data from a JPG file
import EXIF
def getExifTagsForFile(filename):
    f = open(filename, 'rb')
    tags = EXIF.process_file(f)
    return tags

Step 5: Document creation with Metadata

In this step, you need to create the document in the target repository with a list of properties that you must set for the CMIS repository to know what exactly to create. For example, in CMIS, when you POST a new document to a folder (which logically means create), it is the list of properties on the object that tells CMIS what to instantiate. Do you mean to create an instance of cmis:document or do you want a subclass of document named CmisJpg? It is the properties list where this information is communicated.

Metadata is an area that I expected to get ugly (code wise) in a tool like this. I was pleasantly surprised how I could achieve this type of dynamic mapping with so little code. Hats off to Jeff Potts (cmislib author) for making this so easy.

createCMISDoc(), the outer method you see in Listing 6, gets called for every document that is to be created in the target folder. The propBag that you see being passed in as the last parameter is the list of Exif tags that you got from the getExifTagsForFile method. You make a call to createPropertyBag() to set the cmis:objectTypeId property (this specifies the type of object to create) and to process all of the tags into appropriately typed objects that will match the target repository’s definition for that particular property. In the end, the actual creation of the document in the target folder object is a single line of code: newDoc = folder.createDocument(…).

Listing 6.The createCMISDoc method
def createCMISDoc(folder, targetClass, docLocalPath, docName, propBag):
    Create document in CMIS repository in the folder specified.  
    Create the document of type targetClass
    Take stream for this document from docLocalPath
    Set the name of the document to be docName 
    Set the properties on the object using the propBag

    def createPropertyBag(sourceProps, targetClassObj):
        Take the exif tags and return a props collection to submit 
          on the doc create method
        # set the class object id first. 
        propsForCreate =  {'cmis:objectTypeId':targetClassObj.id}
        for sourceProp in sourceProps:
            # First see if there is a matching property by display name
            if (sourceProp in targetClassObj.propsKeyedByDisplayName):
                # there’s a matching property in the repo's class type !
                print "Found matching metadata: " + sourceProp
                # now make the data fit
                         sourceProps[sourceProp] )

        return propsForCreate

    props = createPropertyBag(propBag, targetClass)
    f = open(docLocalPath, 'rb')
    newDoc = folder.createDocument(docName, props, contentFile=f)
    print "Cmislib create returned id=" + newDoc.id

Step 6: Dynamic metadata mapping to the target document

Shown in Listing 7, the last method you look at in this article is the one that handles the dynamic metadata mapping from whatever properties you find in the Exif data to whatever you find is defined for the target document class. The method is addPropetyOfThecorrectTypeToPropbag(); what can I say, I like descriptive function names. This method is doing the most complex work of the whole script, but as you can see it’s pretty simple to follow thanks to cmislib. For example, if valueToAdd (always a string when coming from exif-py) contains the value 56 and the typeObj property type is int, then you convert it into a proper int object and set the value. If the target repository says that it should be a string, then you leave it alone. If the conversion does not work (say the value contained f2.0), then the conversion fails and you just skip this property, but the document creation still works. So however you set up the property definitions in your target CMIS repository, this code will try to make it work.

Listing 7. The addPropertyOfTheCorrectTypeToPropbag method
def addPropertyOfTheCorrectTypeToPropbag(targetProps, typeObj, valueToAdd):
    Determine what type 'typeObj' is, then convert 'valueToAdd' 
    to that type and set it in targetProps if the property is updateable. 
    Currently only supports 3 types:  string, integer and datetime
    cmisUpdateability = typeObj.getUpdatability()
    cmisPropType = typeObj.getPropertyType()
    cmisId = typeObj.id
    if (cmisUpdateability == "readwrite"):   
        # first lets handle string types
        if (cmisPropType == 'string'):
            # this will be easy
            targetProps[cmisId] = valueToAdd
        if (cmisPropType == 'integer'):
                intValue = int(valueToAdd.values[0])
                targetProps[cmisId] = intValue
            except: print "error converting int property id:" + cmisId 
        if (cmisPropType == 'datetime'):
                dateValue = valueToAdd.values
                dtVal = datetime.datetime.strptime(dateValue ,
                         "%Y:%m:%d %H:%M:%S")

                targetProps[cmisId] = dtVal
            except: print "error converting datetime property id:" \
                 + cmisId

Figure 1 shows the FileNet® Enterprise Manager tool displaying the newly created CmisJpg class with a few sample Exif named properties that I set up for testing. Remember the way this code works, the display name of the property is the key. All that needs to happen for this code to kick in is for a property to exist in the target class whose display name exactly matches the name of the property in the Exif tags.

Figure 1. Screen capture of the FileNet P8 Content Engine admin tool showing CmisJpg properties
Screen capture of the FileNet P8 Content Engine admin tool showing CmisJpg properties

A note about type definitions and IDs: Object type definitions in CMIS include a unique ID that identifies the type as well as a more user-friendly display name. The cmisxcopy script attempts to map Exif property names to identical names on the CMIS side. The reason I chose to map to the CMIS object type's display name and not the ID is that the ID does not have to be exactly the same as the display name. In some repositories it might be, in others it might be something entirely different like a GUID. This is the sort of thing that testing with multiple repositories shakes out. Any value for the ID is legal CMIS; the point here is that clients should not make assumptions about them if they want to safely conform to the specification. Since I need to make sure that I get the right property, I match the display name since those are more likely to match.

Running the tool

Now, it's time to set up the tool and try it out.


First, you must download the files (see Resources ).

Installing cmislib with easy_install

  1. Download the setup tools from the Python setup tools in Resources and install them per the instructions for your platform. This is handy to have around for other libraries too.
  2. Go to your Scripts directory and run the command: easy_install cmislib
  3. That’s it! The easy install will go out and find the cmislib, download it, and install it for you into your Python environment.
  • Install Python 2.6.4 (the latest 2.x version).
  • Download the full source for this article and place it in a directory from where you will execute it (I will refer to this directory as src).
  • Download the cmislib tar.gz version and place the cmislib directory under src into your source directory. So, in your src directory, you should now see a cmislib sub-directory and it will contain four .py files (init, exceptions, model, and net).The reason I use the manual installation here (instead of setup_tools) is so that you will have a transportable script that will run without any modification of the local Python environment. This just goes back to the long-lived issues I discussed before.
  • Download exif-py and place the exif.py file in your src directory.

To prepare to run the tool, first edit the cmisxcopy.cfg (see the download link in Resources) in the same src directory where you have your cmisxcopy.py script. For your first run, you will probably want to just copy some files without any custom metadata so you will set the targetClassName parameter to be (the always present) cmis:document. Then set the serviceURL to a valid value, followed by the user_id and the password and you are ready to go.

Listing 8. The cmisxcopy.cfg file
          serviceURL=<your service url here>
          # DEBUG MODE
          user_id=<user name to use for authentication>
          password=<password here>

Next make sure you have exif-py and cmis lib on your path or in the same directory as described in the previous section. Specifically the Exif.py file should be in the same directory as the rest of your source (.py) files.  For cmislib, the cmislib directory containing its five files should be in the same directory as your source, or have cmislib installed into your python environment using the 'easy_install' method. For easy_install instruction, see Installing cmislib with easy_install.

Finally make sure Python is on your path by typing:

python -V

This command should return something like this:

Python 2.6.4

Note: I tested this code with Python version 2.6.4.

User story

Suppose you want to copy your Hawaiian vacation photos to your repository for publishing. Your files are sitting on your local drive at C:\photos\hawaiiVacationTree and you have created a target directory on your CMIS repository that you want to use with the path of /photos/Hawaii. Also, you have some videos in this tree that you don’t want to include so you will include a filter of *.jpg so you just get stills and nothing else. Here is what you type:

python cmisxcopy.py -s C:\photos\hawaiiVacationTree -t /photos/hawaii -f *.jpg

Later, once you have set up a class in your CMIS repository named Jpgs with property definitions corresponding to the Exif values you need, you can edit your cmisxcopy.cfg and change targetClassName to:


Then re-run the same command—or perhaps change the target directory if you don’t want to have duplicates in the same folders. All of the metadata in your original Exif tags is preserved when copying the images to your CMIS repository.

Figure 2. FireFox connector pointing to target directory on Alfresco’s public server after cmisxcopy
FireFox connector pointing to target directory on Alfresco’s public server after cmisxcopy

A note about CMIS testing

When you build a CMIS client or tool, remember it is virtually impossible to build a truly compatible CMIS client if you only test with one repository. If you test with only one repository, you built a client for just that one repository, not a true CMIS client. Please always test with at least two compliant repositories to make sure that you are Oasis CMIS specification compliant. In keeping with these values, I tested this code with both IBM servers and the Alfresco™ public server. The easy part for me is that Jeff Potts has already done the work of making sure cmislib is compatible with the specification as opposed to just one repository, so after getting the prototype to work with the IBM CMIS server, my first attempt with the Alfresco server just worked.

Suggestions for further development.

I leave a couple of rather obvious extensions to this tool as an exercise for the reader. The first and most obvious is to allow the source file structure to also be a CMIS system. This small change will effectively make this tool a universal cross repository migration tool. The second, perhaps less obvious one, is to support the metadata from other file types like PDFs, MP3s, or Microsoft® Office documents. For example, if I have a Word document on my local file system that contains a property named invoiceNumber with a value of US900201292339 and the CMIS target class has a string property named invoiceNumber, then when I CmisXcopy the document to CMIS, it should do the right thing with that extra data.

Happy coding!


Other articles in this series

Now you've seen how easy it is to script complex operations against ECM repositories in a way that is compatible with all repositories that are CMIS compliant. Hopefully, this will inspire you to look into CMIS if you have not done so already, or if you have to consider using cmislib and Python next time you have some scriptable work to do on your CMIS ECM system.


Sample configuration and Python scriptsourceForArticle.zip5KB



Get products and technologies



developerWorks: Sign in

Required fields are indicated with an asterisk (*).

Need an IBM ID?
Forgot your IBM ID?

Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.


All information submitted is secure.

Dig deeper into XML on developerWorks

Zone=XML, Open source, SOA and web services
ArticleTitle=A CMIS API library for Python, Part 2: Build real world ECM tools with Python and cmislib