Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

XML Matters: Get the most out of gnosis.xml.objectify

Use utility functions for enhanced object behavior

David Mertz (mertz@gnosis.cx), Protagonist, Gnosis Software, Inc.
David Mertz
To David Mertz, all the world is a stage, and his career is devoted to providing marginal staging instructions. David may be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/dW/. Suggestions and recommendations on this, past, or future columns are welcomed. Check out David's book Text Processing in Python.

Summary:  The XML binding gnosis.xml.objectify was designed, in many ways, more as a toolkit than as a final tool. But this leaves some (potential) users confused about how to specialize it for some common tasks. In this article, David shows readers how very thin wrappers can customize gnosis.xml.objectify to perform actions such as: Provide XPath access to child objects; automatically reserialize objects to XML; modify the syntax of access to nodes. Some of these techniques involve rather trivial specialization of provided parent classes. Others involve small utility functions.

View more content in this series

Date:  23 Nov 2004
Level:  Advanced

Comments:  

Python XML bindings seem to pop up almost every day, not because of anything missing in existing libraries like gnosis.xml.objectify or ElementTree, but simply out of Not Invented Here syndrome. Though perhaps somewhat biased, I continue to feel that my own gnosis.xml.objectify -- the first of these tools to be developed -- continues to be the most versatile and Pythonic binding available (and also one of the fastest and most memory friendly). Unfortunately, the multiplication of just-slightly-different libraries for the same purpose is an affliction Python suffers in several other areas as well.

In part, developers invent their own tools simply because they do not immediately see how to accomplish goals in the existing tools. In this article, I will remedy that, in part, relative to gnosis.xml.objectify.

The gnosis.xml.objectify philosophy

My goal in creating gnosis.xml.objectify was to provide a module that transforms data in XML documents into completely native Python objects. In particular, it is not very Pythonic to access data using getters and setters, or other similar methods. In Java and some other languages you do things this way -- and largely as a result of the Java style, this is how you do things in DOM, even in Python.

For gnosis.xml.objectify, all the data that comes from an XML document -- whether it's from element bodies or from XML attributes -- is simply data in object attributes. If a given object has multiple children with the same name, the attribute points to a list of like-named children. But even if the object has only one child with a given name, that one child is kind enough to act like a list for iteration purposes. When accessing a gnosis.xml.objectify object, the simplest thing that could possibly work almost always does work.

Listing 1 is a very quick primer and example for readers new to the library:


Listing 1. Basic Usage of gnosis.xml.objectify
>>> from gnosis.xml.objectify import make_instance
>>> xml = "<foo><bar>Text</bar><baz a1='bat'/><baz>blip</baz></foo>"
>>> foo = make_instance(xml)
>>> foo
<foo id="48b170">
>>> foo.bar
<bar id="48b300">
>>> foo.baz
[<baz id="48b210">, <baz id="48b030">]
>>> for bar in foo.bar: print bar
...
<bar id="48b300">
>>> foo.baz[0].a1
u'bat'
>>> foo.bar.PCDATA
u'Text'
>>> foo.bar[0].PCDATA
u'Text'


What gnosis.xml.objectify does not do

The node objects in gnosis.xml.objectify trees are, by design, quite dumb. Yes, they print moderately nice looking representations of themselves; and single instances also act list-like when appropriate, but instance-like otherwise. But generally, node objects eschew any special methods or attributes (or at least they do so unless you decide to program your own special behavior into particular node types, specified by their element name). For one thing, any methods I might have added to node objects would potentially conflict with tagnames in the generic XML documents that gnosis.xml.objectify parses. But more importantly, I believe Python is natively a perfectly good language (excellent, in fact), so you can and should use exactly the same generic techniques that you would use to work with any old object on ones that happen to have been generated from XML sources.

However, I have found -- particularly of late -- that the very flexibility of gnosis.xml.objectify gives some users the false impression that they cannot achieve the constrained goals that some more XML-oriented bindings provide as default behaviors. To address this, I have added a subpackage, gnosis.xml.objectify.utils (see Resources), to the Gnosis Utilities package to illustrate several of the most-requested XML-oriented usages. However, these utilities, while genuinely useful as provided, are still intended more as examples of what you can do than as official APIs for gnosis.xml.objectify. The idea here is that gnosis.xml.objectify does not have an API, except the API of Python itself.


Performing XPath searches

One of the perceived strengths of Fredrik Lundh's ElementTree and Uche Ogbuji's Anobind (see Resources) is their use of XPath-like node-search methods. To my mind, XPath syntax is still somewhat overly XML-oriented, but enough users requested this that I decided to add a utility function, gnosis.xml.objectify.utils.XPath(), to Gnosis Utilities. In about 50 lines, I was able to implement a significant superset of the XPath support in either ElementTree or anobind -- though not the complete XPath specification, which is large.

Specifically, I enabled the following XPath features:

  • Named node search by specifying a tagname
  • Recursive node search using the // delimiter
  • Wildcard searches using the * symbol
  • Text node search using the text() pseudo-function
  • Attribute search using the @ prefix
  • Wildcard attribute search using the @* symbol
  • Node indexing and slicing

Moreover, since this is Python, I allow users to use not only XPath simple numeric indexing but also a general slice notation. Since XPath is one-based in indexing while Python is zero-based, I emphasize the non-Python semantics by indicating slices differently in a pseudo-XPath; for example, /tagname[2..5] indicates the inclusive range from the second to the fifth <tagname> element in the document root.

While I was at it, I wrote the whole thing as a lazy iterator so you don't need to instantiate a large node-list if you don't need one. Of course, if you want an instantiated node-list, just use list(XPath(obj,path)) to get one.

However, even though I recognize the coolness of predictive indexing, my simple function does not bother implementing it. There is nothing conceptually difficult about implementing the remaining bits of full XPath; I just did not find it necessary (or concise) as an illustration. For example, the test script test_xpath.py that I will include in future Gnosis Utilities distributions includes the following test XPaths (and outputs correctly on each):


Listing 2. Patterns tested in test_xpath.py
patterns = '''/bar  //bar  //*  /baz/*/bar
              /bar[2]  //bar[2..4]
              //@a1  //bar/@a1  /baz/@*  //@*
              baz//bar/text()  /baz/text()[3]'''

Node walking in four lines

To support this, I created a little recursive traversal function that walks all the nodes of a gnosis.xml.objectify object. You can use it by itself if you like. You may find it useful for performing your own non-XPath filtering on a tree. Of course, the following calls should be equivalent: walk_xo(obj) and XPath(o,"//*") (the first will perform slightly less housekeeping). The function looks like this:


Listing 3. Compact, lazy, recursive node traversal
def walk_xo(o):
    yield o
    for node in children(o):
        for child in walk_xo(node):
            yield child

Simple, huh? Another small support function simply parses out index values if they are given within a (pseudo-)XPath. I will not bother reproducing that here.

An (almost) full XPath wrapper

The trick in making the XPath() function so concise is the fact it has so little need to worry about XML per se (see Listing 4). Most of the work here lies in just making sense of the XPath string itself. Some existing one-line wrapper functions -- like children(), text(), and attributes() -- make the code look a bit nicer, but are themselves extremely simple filters. In other words, you could use something very close to this same function against objects that never derived from XML.


Listing 4. The gnosis.xml.objectify.utils.XPath() function
def XPath(o, path):
    "Find node(s) within an _XO_ object"
    path = path.replace('//','/!!') # Placeholder hack for easy splitting
    if path.startswith('/'):        # No need for init / since node==root
        path = path[1:]
    if path.startswith('!!'):       # Recursive path fragment
        path, start, stop = indices(path)
        i = 0
        for node in walk_xo(o):
            if i >= stop: return
            for match in XPath(node, path[2:]):
                if start <= i < stop:
                    yield match
                i += 1
    elif '/' in path[1:]:           # Compound, non-recursive
        head, tail = path.split('/', 1)
        for node in XPath(o, head):
            for match in XPath(node, tail):
                yield match
    else:                           # Atomic path fragment
        path, start, stop = indices(path)
        if path=="*":               # Node wildcard
            for node in islice(children(o), start, stop):
                yield node
        elif path=="text()":        # Node text(s)
            for s in islice(text(o), start, stop):
                yield s
        elif path.startswith('@*'): # All node attributes
            for attr in attributes(o):
                yield attr
        elif path.startswith('@'):  # Specific node attribute
            for attr in attributes(o):
                if attr[0]==path[1:]:
                    yield attr
        elif hasattr(o, path):      # Named node type
            for node in islice(getattr(o, path), start, stop):
                yield node


Serializing to XML

From time to time, users have been bothered by the fact that gnosis.xml.objectify does not reserialize its objects to XML. In comparison with other Python XML bindings, this is said to be a weakness. I disagree: Those other bindings still force you to think of their Python objects in XML terms, not Python terms. Only blessed objects and attributes are serialized, not everything a Python object might have.

For example, in ElementTree, you can perform steps like:


Listing 5. ElementTree example
>>> from elementtree import ElementTree
>>> et = ElementTree.parse("xpath.xml")
>>> et.write(sys.stdout)

But if you change the object et (or any child nodes you might generate with methods like .getroot(), .find(), or .findall()), your additions are not generally serializable. For example, this does not change the serialization at all, even though it changes the object:


Listing 6. Modified ElementTree example
>>> et.new = 'flaz'
>>> et.getroot().more = 123
>>> et.write(sys.stdout).

Similarly, with Anobind and its .unbind() method, you can add special XML-oriented nodes using API methods like .append(), .insert(), or .remove(). But then, gnosis.xml.objectify can also add blessed attributes using its gnosis.xml.objectify.addChild() utility function (and using gnosis.xml.objectify.createPyObj() to make a special _XO_ object to add).

If you just want generic serialization of gnosis.xml.objectify objects, perhaps with a few values changed from the original XML, you can write a utility function to do this in 12 lines:


Listing 7. Generic XML serialization
def write_xml(o, out=stdout):
    "Serialize an _XO_ object back into XML"
    out.write("<%s" % tagname(o))
    for attr in attributes(o):
        out.write(' %s=%s' % attr)
    out.write('>')
    for node in content(o):
        if type(node) in StringTypes:
            out.write(node)
        else:
            write_xml(node, out=out)
    out.write("</%s>" % tagname(o))

But to my mind, the real power of working with objects in Python comes in non-generic serialization and transformation. Rather than just dumping every attribute back into XML, you might want to filter and massage nodes before writing them. Of course, just what you manipulate depends on your application requirements.


Custom container objects

An approach to XML binding taken by Dave Kuhlman's generateDS (see Resources), as well as some other less mature bindings, is to require custom Python classes for each XML element type in the documents that you process. In Kuhlman's case, these custom classes are generated from corresponding W3C XML Schemas (but only allow a subset of the full WXS specification). In contrast, gnosis.xml.objectify -- along with ElementTree, Anobind, and some others -- will bind any old XML document without any special programming.

However, gnosis.xml.objectify, like Anobind but unlike ElementTree, lets you create custom node classes if you want to use them. In fact, you can substitute the base class for every node object, giving your whole application custom behaviors.

I think beginning users of gnosis.xml.objectify have been intimidated by the idea of specializing classes per-tagname. Here are a few examples that show just how non-threatening it really is.

Redefining the _XO_ base class

Whenever you customize a base class, you need to inject the next class back into the gnosis.xml.objectify namespace. This step involves some magic, but is not difficult to do. I might give the step a friendlier name in a wrapper function in the future, but the style emphasizes that you are changing the module itself. For example, tagnames are mangled in Gnosis Utilities 1.1.1, but not attribute names; this makes it more difficult than necessary to access attributes whose names contains characters disallowed in Python variables. One fix for this is to also allow dictionary-like access to these attributes:


Listing 8. Adding dictionary-like attribute access
>>> import gnosis.xml.objectify
>>> class newXO(gnosis.xml.objectify._XO_):
...     def __getitem__(self, key):
...         return getattr(self,key)
...
>>> gnosis.xml.objectify._XO_ = newXO
>>> o = make_instance('<o><my-doc my-name="david">Stuff</my-doc></o>')
>>> print o.my__doc['my-name']
david
>>> getattr(o.my__doc,'my-name')  # Works without custom base
u'david'

Redefining per-tagname node classes

Redefining base classes is probably of greatest utility for specific per-tagname classes that you know certain things about. For example, if a certain element is always a leaf node in a particular document type (and has no XML attributes), you might want to refer to its PCDATA just by the node name itself. Of course, if the input XML is not structured in the way you assume, accessing children is more difficult in this case. One way to program this behavior is:


Listing 9. An AutoPCData custom node class
>>> from gnosis.xml.objectify import make_instance
>>> xml = '''<group>
...            <var><description>foo</description></var>
...            <var><description>bar</description></var>
...          </group>'''
group = make_instance(xml)
print group[0].variable[0].description
<description id="23cf2c">
print group[0].variable[0].description.PCDATA
foo
>>> import gnosis.xml.objectify
>>> class AutoPCDATA(gnosis.xml.objectify._XO_):
...     def __repr__(self):
...         return self.PCDATA
...
>>> gnosis.xml.objectify._XO_description = AutoPCDATA
>>> group = make_instance(xml)
>>> print group[0].variable[0].description
foo

Even more clever, in AutoPCDATA you can check objects for what attributes other than .PCDATA they have, and return different values for the different cases.

Another application-specific approach to custom classes performs calculated access. One of the several Python bindings called XMLObject gives an example of data about a family with multiple members:


Listing 10. Family tree as XML
<Family>
  <Member Name="Abe" DOB="3/31/42" />
  <Member Name="Betty" DOB="2/4/49" />
  <Member Name="Edith" Father="Abe" Mother="Betty" DOB="8/30/80" />
  <Member Name="Janet" Father="Frank" Mother="Edith" DOB="1/17/03" />
</Family>

It might be handy to access family members solely by name, without bothering with the whole XML hierarchy. One obvious approach is with a custom Family class:


Listing 11. Dictionary-like access into a child attribute
class Family(gnosis.xml.objectify._XO_):
    def __getitem__(self, key):
        for member in self.Member:
            if member.Name = key:
                return member
gnosis.xml.objectify._XO_Family = Family
Family = make_instance('family.xml')
print Family['Janet'].DOB

However, if names are not unique you may want to expand upon this particular approach.


Wrapping up

The general techniques for wrapping gnosis.xml.objectify shown in this article are meant mostly as examples for more specific customizations by users. You can achieve great flexibility and power by keeping APIs highly open and minimally specified, leaving customization at the application level rather than the library level.


Resources

About the author

David Mertz

To David Mertz, all the world is a stage, and his career is devoted to providing marginal staging instructions. David may be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/dW/. Suggestions and recommendations on this, past, or future columns are welcomed. Check out David's book Text Processing in Python.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=31943
ArticleTitle=XML Matters: Get the most out of gnosis.xml.objectify
publish-date=11232004
author1-email=mertz@gnosis.cx
author1-email-cc=dwxed@us.ibm.com

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).