• 1 reply
  • Latest Post - ‏2005-09-25T19:47:43Z by SystemAdmin
29 Posts

Pinned topic Directions for gnosis.xml.objectify

‏2003-06-24T18:57:08Z |
I recently had a long conversation in email with a gnosis.xml.objectify user. I have been considering upgrading the API to allow direct serializatin back to XML, but haven't decided whether to do so, and if so the best way to do so cleanly. I very much welcome feedback from readers here:
|One of the really missing features which would make objectify a lot more
|usefull for me would be a new method (perhaps overloading __str__()?) to
|dump the XML representation of the object tree. It'd be nice to read in
|an XML document, convert it to python objects, do some manipulations on
|it (reorder, deletions, perhaps even additions and modifications) and
|then print the object back out as an XML doc with equivalent structure.

The "equivalent structure" is the part that is far less straightforward
than you think. Since gnosis.xml.objectify avoid any heavy API that
forces node operations to be valid, a user could create an object with a
quite different structure. In fact, they could modify the object into
something that is awfully hard to turn into any kind of XML:

root = XML_Objectify('something.xml').make_instance()
foos = # the <foo> nodes
root.child = SomethingVeryUnNodelike()[3] = re.compile('sre object')

What should happen here? Of course, gnosis.xml.pickle goes from the
other end, and turns -any- Python object into -some- XML (of the special
xml_pickle dialect)... not necessarily something structurally close to
what you started with).

I've actually been thinking about this just in the last few days. I
wrote to my colleague Uche Ogbuji as below (in part). I also welcome
your suggestions about how to go with this.

(3) I recently wrote an XM installment that discusses ElementTree. I
reference your article there; and mostly wind up comparing it
with gnosis.xml.objectify. In the end--perhaps out of author's
vanity--I wind up liking my own thing better. But I -do- think I went
into it ready to claim that ElementTree was a better choice for
whatever. There's a limited amount of that still there but...

(4) One obvious deficit of gnosis.xml.objectify compared to ElementTree
(or generateDS, or DOM) is that it doesn't come with its own
marshalling. You can build it yourself given an unmarshalled object,
but it's not there automatically.

One reason it's not there (other than lack of user demand and/or
patches) is that as-is, the unmarshalling is lossy as to document
structure. But I realized once I looked at ElementTree that there's no
reason I couldn't store the structural information.... I had thought
before that it would take a broad refactoring of objectify. But now I
think it would be (relatively) easy: just create a new private
attribute on nodes, say '.__seq' that contains the order of all the
children. E.g. (in the future)

>>> xml = '''<root>
... <foo>this</foo>
... <bar>that</bar>
... <foo>other</foo>
... </root>'''
>>> open('tmp.xml','w').write(xml)
>>> from gnosis.xml.objectify import XML_Objectify # 1.0.7+
>>> root = XML_Objectify('tmp.xml').make_instance()
[<gnosis.xml.objectify._objectify._XO_foo instance at 0x2a4fcc>,
<gnosis.xml.objectify._objectify._XO_foo instance at 0x2a500c>]
<gnosis.xml.objectify._objectify._XO_bar instance at 0x2a614c>
>>> root.__seq # hypothetical new attribute
[<gnosis.xml.objectify._objectify._XO_foo instance at 0x2a4fcc>,
<gnosis.xml.objectify._objectify._XO_bar instance at 0x2a614c>
<gnosis.xml.objectify._objectify._XO_foo instance at 0x2a500c>][/code]

The new private attribute would not use much extra memory, since it
would not need to contain whole new child nodes, just references to the
existing ones. Then, of course, I'd want either a node method .toxml(),
or simply a utility function toxml() (either way, I'd use .__seq for
reconstruction purposes).

Thinking about it, I'd also have to handle mixed content. My first
feeling about that is to intersperse a bunch of slice objects with the
child elements within node.__seq. The slice would index into the
node.PCDATA attribute.

But doing this is not particularly robust. It would not be hard to
remove deleted children from .__seq, for example. But maintaining
consistency across every addition of new attributes, rearranging
elements, looking for various types of objects added to the Python
object, and so on, seems to require a much heavier API bolted on to the
simplicity of gnosis.xml.objectify. I'm not sure if an automatic
serialization capability is worth allowing the potential hassle for
users. Maybe just a method like .getChildren() could be added that at
least let users see the original subelement/content ordering, but still
make them build their own marshal if they wanted it.

Anyway, I'm not sure what I'll do with this. But feel free to let me
know what you think is most useful when you start looking at Gnosis
Utilities. Certainly I'm up for remedying perceived lacunae.

|Well, my thought was that when an XML_Objectify instance is converted,
|it'd just recursivly call the toxml() method on all child nodes.

This is a pretty good idea. It's easy enough to filter out any children
that don't know how to serialize themselves, and that has a certain
straightforwardness. I can add a .toxml() method to the '_XO_' base
class easily enough... as long as users inherit from that, they're fine
in creating their own node types (even ones not in the original XML).

There are a couple wrinkles that come to my mind though. Assuming that
I use the .__seq attribute that I described, how would I keep the
sequence in sync with the regular children. I.e.

root = XML_Objectify('foodoc.xml').make_instance()
print root.__seq #--> <foo instance>, <bar instance>, <foo instance>
# What happens to root.__seq in the next lines? And how?
newfoo = _XO_foo(some, args)[/code]

The other main wrinkle is around mixed content. My idea was to store
slices into node.PCDATA to express ordering of mixed content:

root = XML_Objectify('mixed.xhtml').make_instance()
print root.__seq #--> [slice(0,5), <b instance>, slice(6,10),
# <i instance> slice(11,15)][/code]

But of course, if a user wants to change the PCDATA (which seems like a
very reasonable thing to want), the slices don't match up:

# What about the slice after the boldface stuff?
print root.PCDATA6:10 #--> "Hello"
root.PCDATA0:5 = "New value here"
print root.PCDATA6:10 #--> "lue h"[/code]

Now I could make all the text nodes into custom objects, full of
methods, and broken up into little segments. But if I do that, I'm
practically back to having DOM. I like the ease of:

>>> xml = '<root>Some <b>boldface</b> word</root>'
>>> open('mixed.xml','w').write(xml)
>>> from gnosis.xml.objectify import XML_Objectify
>>> root = XML_Objectify('mixed.xml').make_instance()
>>> print root.PCDATA
Some word
>>> print root.b.PCDATA
>>> print root._XML
Some <b>boldface</b> word[/code]

Did you know about this?

I'm not saying these issues are insurmountable. But if I grow more API,
I don't want it to be in the wrong direction; it should be thought out.

|The key element though is that I need to preserve the entire structure,
|including ordering, of all elements under a point in order to not break
|XSLT transformations.

That's kind the point of making users do their own serialization (the
pyobj_printer() function provides a kind of template). Your custom
output is perfectly free to impose the order of children that it thinks
it needs.
Updated on 2005-09-25T19:47:43Z at 2005-09-25T19:47:43Z by SystemAdmin
  • SystemAdmin
    29 Posts

    Re: Directions for gnosis.xml.objectify

    This capability seems like it would have quite a bit of general usefulness. This post is from 2 years ago - any updates?