IBM®
Skip to main content
    Country/region [select]      Terms of use
 
 
    
     Home      Products      Services & industry solutions      Support & downloads      My IBM     
developerWorks  >  Blogs  >   developerWorks

author Service Oriented Architecture and Business-Level Tooling

Simon works in the CTO organization of Rational Software and is responsible for the business-level tooling strategy. Simon has undertaken a number of standards-related activities for both Rational Software and now IBM in the area of XML (W3C Schema working group), Web Services (RosettaNet architecture team) and Modeling (OMG UML and OCL teams). Simon has also written articles on the subjects of business modeling, software modeling and SOA and is interested to see where and when these threads will combine.



Wednesday April 09, 2008

RDF - the good, the bad or the ugly?

I've mentioned before here that one great feature of the JRS server is it's indexing ability, in some ways it's akin to what Lucene does for text search in that there are a set of format-specific components that are able to extract properties from resources and a store into which these are put and made available for query. The difference between JRS and Lucene[*] is that we are trying to extract structured properties that can be made available via a more traditional query language - think XQuery and you'll have anticipated a future post. Some of these components have a fixed set of things to extract, for example we have an EXIF indexer that pulls a pre-defined set of properties from image files. Some however are configurable, so our XML indexer has a declarative specification that tells the server which parts of your resources the server should index. So what has this got to do with RDF?

Having indexed properties of resources we need not only to make them available for query, but we also want a query to be able to return some of these properties as well. We also support the ability to ask for all the properties that have been indexed for a resource by appending "?properties" to the URL of a resource. We wanted a format that would be able to encode these indexed properties in a regular way, and if possible pick up a standard one. We chose RDF as a standard, but also because our internal indexing components and storage have been inspired by RDF in a number of ways already. So, when you do a GET on {resource-uri}?properties you get a nice RDF document back, something like this:

<rdf:Description 
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dc="http://dublincore.org/documents/dcmi-terms/"
  xmlns:ns="http://example.com/xmlns/music#"
  rdf:about="/jazz/resources/musicdb/albums/album-1">
  <dc:contributor rdf:resource="/jazz/users/zoe"/>
  <dc:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">
      2008-03-02T00:00:00
  </dc:modified>
  <dc:format>application/xml</dc:format>
  <rdf:type rdf:resource="http://example.com/xmlns/music#album"/>
  <ns:name>A Matter of Life and Death</ns:name>
  <ns:releasedYear rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">
    2006
  </ns:releasedYear>
  <ns:artist rdf:resource="/jazz/resources/musicdb/artists/artist-1"/>
  <ns:genre>Rock</ns:genre>
  <ns:genre>Heavy Metal</ns:genre>
</rdf:Description>

Where the properties in italics are system properties - JRS maintained properties for all resource types, and the rest have been extracted from an XML resource using the declarative indexer, so you can imagine that we extracted only a subset of element values from the source resource.

The point of the post is that RDF is a great format for this, the body of work and literature on RDF allows us to think about what these property documents mean, especially in the presence of many linked resources. So for us the use of RDF in JRS has been really good for the server team, and we had thought that the client teams would think so too. Well the first thing that came up was the <rdf:RDF> tag that we had originally put around all the description documents, this seemed like unnecessary overhead (the ugly) and so having re-read the RDF spec and found that it was optional anyway we took it out. The next reaction was not one we had expected, one development manager here asked whether this meant all his developers now have to become RDF experts. Well, we had not anticipated that, we had used RDF as first and foremost an XML dialect and as such we internally and in our samples manipulated it only with XQuery and XPath. So no, we expect most developers do not need to know RDF, they just see a particular XML form with some particular idioms (we could even use another prefix than "rdf" and I wonder how many people would even recognize that it is RDF?).

The interesting opportunity is when you DO treat this body of index data as RDF, for example XQuery is great, but like SQL it really falls down trying to navigate a graph of resources unless the query has fixed the relationships ahead of time (think nested queries). SPARQL, as an RDF query really excels at this kind of query but falls down in other ways. We expect that many of the first JRS applications will use only XQuery and XPath, but over time we expect that some application requirements will be much better expressed in SPARQL which we hope to provide in JRS at some future point.

I am interested that there seems to be a bad impression of RDF in the general developer community, personally before working on JRS I had had no practical experience of RDF but it does fit the need we had. I wonder if there is an assumption that to consume RDF you must also consume RDFS, OWL, think in Ontologies and so forth? Perhaps the hype around the semantic web has put some people off? Whatever the reason, the comment we heard on RDF in our meeting seems to be common, and we've learned to smile sweetly and spend time explaining to many people why this is good for them (there are some people we just tell to suck it up - but that's a different matter).

Good? Yes, for us, in this case it is. Bad? No, but it does take some selling. Ugly? Not really, actually when carefully used someone commented that "you really couldn't have made it any simpler or clearer if you tried".

* - JRS does use Lucene as well, so we support both structured query as well as full-text search.



Categories : [  
RDF  |  XML  |  jazz  ]

Apr 09 2008, 09:23:57 PM EDT Permalink



Wednesday February 20, 2008

New entry on the Jazz blog

James Branigan, one of my fellow JRS developers has a nice article on the Jazz blog about the journey taken by the Jazz team and which brings us up to date with JRS. A brief history of the Jazz Team Server interface: Our journey from a J2EE server towards a RESTful server

Oh, and for those in the know the Python server mentioned in James posting is the project I have mentioned here a few times, it used Python and Django to build a fairly complete RESTful content repository and in fact the documentation from that project became the initial specs for JRS. We never intended for that server to see the light of day outside the lab, it became affectionately (I like to think) known as the "Rinky-Dink" server.



Categories : [   Jazz  ]

Feb 20 2008, 08:29:11 AM EST Permalink


Wednesday February 20, 2008

JRS and JCR (JSR-170) again

David Nuescheler Said (here):

Hi Simon,

Thanks a lot for this post. I am very interested in the future development of JRS and as the Spec-Lead of JCR (aka. JSR-170 & JSR-283) I would like to ask for some more detailed clarifications around some of the above statements.

(1) Flat, Hierarchical & non-contiguous JCR in my mind is not limited to exposing hierarchical only structures. As a matter of fact I think of flat as a very simple hierarchy to begin with. In a content repository is certainly acceptable that there is no accessible (readable) direct parent node of an item (be that for access control reason reasons). In my mind the URL space itself is hierarchical though its path segments as defined in http://gbiv.com/protocols/uri/rfc/rfc3986.html#path

(2) Personally, I believe that the tie into "Java" is more of tie into the JCP as a standardizing body and not so much about the tie into the "Java language". Maybe the discussions around APP & JCR http://dev.day.com/microsling/content/blogs/main/jcr-loves-atom.html and a couple of comments around some of the non java languages that use JCR may be interesting: http://dev.day.com/microsling/content/blogs/main/fudbusting2.html

(3) I really think that JCR allows both for typed and untyped nodes (we call them structured vs. unstructured) as a matter of fact I am big proponent of a "Data First" architecture and find this feature in JCR one of the most fascinating features.

Anyhow, thanks a lot for your excellent post and I would be happy to engage in a more detailed discussion, feel free to contact me at any point.

David, I would be interested in the discussion, I think the points you make are valid and interesting. In terms of hierarchy it's important to JRS to allow clients to be able to PUT a resource to /a/b/c/mydoc.txt without having to have created a node at /a, /a/b, /a/b/c first which is how I meant to use the term non-contiguous in this context. Of course we also allow /a to be an Atom feed, so you can POST a child feed to /a/b and then another to /a/b/c and finally POST mydoc.txt, providing a completely hierarchical space - or you can mix both styles.

I understand the point about JCP, however we wanted to ensure that we didn't "pollute" our design with any assumptions and so we worked only from the HTTP and APP specs and developed test clients in Java, Python, BASH scripts, etc (see "J" is for Jazz, not Java). As you note in the fudbusting link the JCP API still needs to be implemented in different languages and that there are difficulties with type conversions and so forth. Your third integration approach - that of accessing the content repository through a RESTful interface is exactly the approach taken by JRS and allows maximum freedom in client choice.

I must be showing my ignorance of JCR (my apologies), again I mean that we do not have the notion of resource type in the JRS repository apart from one specific case - Atom feed. On a brand new server I can PUT MS Word documents, code, audio, video, any resource that exists on my machine in fact, into the server without having to define any types ahead of time. All resources are stored simply as they come in, with their content-type and other associated meta-data without intervention or assumption by JRS. I mentioned Atom feeds, the way you create a collection in JRS is simply to PUT (or POST to an existing collection) a valid Atom feed document with the content-type "application/atom+xml" and the server does assume that you want to create a collection at that point - it therefore creates additional structures server-side to allow for clients to now POST to the feed. There is the ability for clients to teach the server about interesting things in XML resources for our indexer, but I'll hold that thought for a posting of it's own.

I hope that makes things a little clearer, and I hope I didn't completely dis the JCR spec :-)



Categories : [   Java  |  jazz  |  rest  ]

Feb 20 2008, 08:23:19 AM EST Permalink



Monday February 04, 2008

JRS server on Mac OS

As I mentioned in a previous post I am now doing all of my JRS development on a MacBook Pro, using the Rational Team Concert beta 2 client. I hadn't given any thought to the stand-alone server until the work item [43138] appeared. I always run the server from within Team Concert as I am working, not from the command line. Anway, a little shell script hacking later and at least the JRS server.startup and server.shutdown scripts now work on Mac OS for all the other Mac developers out there.

Categories : [   jazz  |  mac  ]

Feb 04 2008, 01:21:27 PM EST Permalink



Saturday January 19, 2008

Jazz is for everyone

Looks like Jazz is now for everyone (well not that progressive modern stuff, but who doesn't like Louis Armstrong?). Seriously the Jazz web site is now accepting registration from anyone, lifting the restriction that you be an IBM Rationl partner or customer.

Everyone is now welcome to join Jazz.net. A special thank-you to all our Rational customers and partners, the university researchers and students, and everyone else who was part of the Jazz.net early pilot program.

Looking for JRS? Currently our builds aren't in the build list, but you can see our work items on the site.



Categories : [   jazz  ]

Jan 19 2008, 10:03:43 PM EST Permalink



Thursday January 17, 2008

"J" is for Jazz, not Java

In a previous post on JCR I mentioned that JRS had consciously avoided the development of a client-side Java API. In fact there is no requirement for application clients to be developed in Java at all. One of the concerns we saw for previous Rational products was the complexity of the API and it's proprietary nature which made interoperability, integration and extension an expensive and complex proposition.

To ensure we remained honest through development we wrote the majority of our test suite to only use the JDK to open connections to the server and parse XML. We didn't allow the test suite to use any of the server code, any client library we developed ourself or in fact any stack other than JDK plus some helper classes. The first round of test cases were actually re-written from a Python test suite developed for an early experimental version of the server. Python has continued to play a part, we are delivering a set of samples and so far are using bash scripts with cURL, Python and JavaScript.

For Java client applications we really don't expect the use of the JDK and have tested with both Apache HttpClient and Abdera (for feed/entry creation and parsing). These seem to be the preferred libraries the application teams want to, and probably should, use.

So at least for us in JRS, if not for the rest of IBM, "J" stands for the Jazz Project and not Java.



Categories : [   jazz  |  python  |  rest  ]

Jan 17 2008, 11:38:16 AM EST Permalink



Sunday January 13, 2008

JRS and JCR (JSR-170)

In an email response to Jazz REST Services, Bill De hOra asked about the relationship between JRS and JRS-170 (Java Content Repository, JCR). He noted the following language in the IBM description of JCR:
"Every node has one and only one primary node type. A primary node type defines the characteristics of the node, such as the properties and child nodes that the node is allowed to have. In addition to the primary node type, a node may also have one or more mixin types. A mixin type acts a lot like a decorator, providing extra characteristics to a node. A JCR implementation, in particular, can provide three predefined mixin types..."
http://www.ibm.com/developerworks/java/library/j-jcr/

Well, it seems to me there are a few specific areas of difference between JRS and JSR-170.

Firstly the strictly hierarchical model for JSR-170 is interesting but we decided not to be so restrictive, and simply to allow the URL-space to be open for users to choose naming schemes, either hierarchical, flat or non-contiguous. We did have some pressure from initial consumers to make everything completely Atom based so that you had to build a hierarchy of folders. When this led to having to create intermediate nodes we went back to a non-contiguous scheme where the client application chooses the storage scheme most appropriate to them.

Secondly a driving principle for the work was to ensure the repository itself was as open as possible, to not have any client language or platform assumptions and so have no Java client-side API - everything is documented in terms of the HTTP/APP operations. This again is a departure for us, not only do we tend to assume we're building Java clients and Java APIs but we advantage Java to the point of making it impossible in some cases to use anything else.

The last major difference with the JSR is the fact that the nodes are "typed", which we decided to avoid in terms of having the server know about the resources (except for the distinction between "simple" resource and feed, analogous to the JSR "unstructured" and "folder"). We also decided that properties should not be attached to a resource as in webDAV but we would extract properties from resources which is where the indexers come in. An indexer is a client written description of how to extract properties from an XML resource (using XPath) so that specific property forms can be indexed in an efficient way.



Categories : [   jazz  ]

Jan 13 2008, 08:03:57 AM EST Permalink



Friday January 11, 2008

Develop with Jazz, for Jazz, and on a Mac

Well, the last few months have been very busy and really fun - am writing code for real! I have been seconded to work on the new Jazz REST Services (JRS) project**. JRS is a technology incubator project as part of the The Jazz Project and provides a RESTful, resource-neutral store which I'll talk about in subsequent posts.

This post then is about using Jazz, rather than developing for, which has been a really positive experience. I've used a whole bunch of source control and configuration management systems over the years, RCS, PVCS, PCMS, CVS, SVN, ClearCase and ClearCase/ClearQuest UCM. They seem to fall into one of two broad categories, file based or work-item based, that is they either deal in checking in/out files and folders or they track work against work items and you commit the item to check-in all the associated change sets. PCMS (way-back when) was work item based, UCM is and now Jazz is as well; however, the level of integration and ease of use in Jazz is really a huge leap forward from any of those.

The workflow, creating a defect/task making changes and associating them to the item is as easy as you think it should be and then the collaboration features to share changes in-flight with team members, request validation of work and so on have been simple enough to use that even a small team like ours has used daily. If anyone has seen any of the demos of Jazz so far you'll have seen Eclipse and Java, lots of Java :-) Well I can say that this is pretty much the out-of-the-box configuration, however it works just as well with PyDev and our Python test client projects.

So, to the last part of the title, yep all my Jazz dev is done on my nice shiny new MacBook Pro. The Jazz client is always provided in a Mac OS X package and has worked perfectly all the way through the project. And, of course, the screen envy from my ThinkPad using colleagues is always nice.

** the link will, at least for now require sign on but that should be removed in the next week or so.



Categories : [   jazz  ]

Jan 11 2008, 02:11:59 PM EST Permalink


Friday January 11, 2008

Jazz REST Services

So, I promised a post on the Jazz REST Services work, and here it is. So first of all what is JRS?

JRS implements a RESTful repository following the architecture and style of the web, the repository is resource neutral, you don't have to pre-define resource types and you certainly don't have to have the repository understand them ahead of use. This is a departure from the way we tend to build tools today where both client and server knows the set and types of "things" during development and this set is not user extensible. This leads to all sorts of difficult issues in tool integration both for us as well as customers and partners. We also have problems in extending these tool "models" or resource types as we tend to develop the models with a closed-world assumption thinking that we can analyze the problem entirely and produce one single model for a given tool domain. Well, we can all imagine how well that works out, and some of us have to live the consequences.

So our proposition is that rather than producing large, monolithic models and closed tools we develop with a much more fine-grained approach and move from a file-system approach to a repository approach. It is also important that the repository should need to know as little as possible about the resource types, this also means that the usual approach of moving resource-specific operations to the server should be discouraged as well. The upshot is a server that we're already building some interesting sample projects and product prototypes upon, with a very cool set of features:

  • We leverage the Jazz notion of a project/repository to allow a server to provide more than one store with it's own set of users and security.
    • We expose the list of projects as well as users, roles and so on using Atom Publishing, so for example adding a new user to the server is a POST to the /jazz/users collection.
    • When running in secure mode the server responds on HTTP/S and supports Basic and Digest authentication.
    • Access to resources uses a role-based authentication mechanism.
  • Resources can be PUT anywhere in the URL space /jazz/resources/{project}/ and they behave in a completely RESTful way.
    • We support GET, HEAD, PUT, DELETE for all simple resources and POST for collections.
    • We implement conditional operations with both ETag and Last-Modified provided wherever possible.
  • All resources are versioned, every PUT creates a new revision.
    • When you update a resource the response includes a Content-Location which provides the version-specific URL.
    • You can retrieve the list of versions for a resource by doing a GET with the query parameter ?revisions.
  • If you PUT a resource and it's content type is application/atom+xml;type=feed and the resource contains a valid feed document then a collection will be created instead of a "simple" resource.
    • So now you can POST to the collection, do a GET to retrieve collection contents, all as expected.
    • Collections allow posting entries and media resources.
  • To support query we provide a set of indexers that either extract text from resources to insert into a Lucene back-end or they extract sets of triples (subject, predicate, object) that represent index properties of the resource
    • Plain text, HTML text and XML text indexers for Lucene,
    • A JSON indexer to allow queries over JSON resources,
    • An image indexer that extracts EXIF tags,
    • An XML indexer that uses custom rules to extract values from XML resources.
  • In terms of query we differentiate three cases:
    • Search - full-text search using the Lucene back-end.
    • Query - a structured query supporting multiple property queries on both system and custom index properties.
    • Properties - a simple API to ask for the indexed properties stored on a particular resource - use the "?properties" query parameter.

Wondering about the duck? You'll see this fellow a lot on our pages, he's an Indian Running Duck (image courtesy of the Indian Running Duck Association) and the mascot for JRS. Why a duck? Well the three of us in Raleigh who formed the core of the development team got to know each other pretty well locked up in a series of secret hideouts around the IBM campus. Two of us have flocks of runners, and chickens, but you simply can't use a chicken as a mascot can you!



Categories : [   jazz  |  rest  ]

Jan 11 2008, 01:59:58 PM EST Permalink



Monday August 20, 2007

64-core ThinkPad anyone?

Via /. I read this great article on ars technica titled MIT startup raises multicore bar with new 64-core CPU. More interesting is this quote from the article:

"Tell me if this sounds familiar: a grid of processor "tiles" arranged in a mesh network, where each tile houses a general purpose processor, cache, and a non-blocking router that the tile uses to communicate with the other tiles on the chip."

Makes that Intel Core Duo in my ThinkPad seem pretty tame now doesn't it. But seriously the question is raised on slashdot already - how do we program this, and efficiently? The company is Tilera, a small player, but maybe the first of many?




Aug 20 2007, 09:33:18 PM EDT Permalink



Friday August 17, 2007

Erlang on IBM blogs

I was checking up on Anant Jhingran's blog and noted that he mentioned Erlang so going back up to the top realized that he was discussing a post from Sam Ruby. Sam is discussing a set of "long bets" although some seem pretty short bets to me. It also seems as if Sam's posting caused enough of a stir to cause an apologia posting :-) I agree with Sam's comments on Erlang, and while the world probably doesn't *need* another programming language I do think we can learn from the set of languages out there and in the area of concurrency I think there are lessons we need to learn.

I also try and separate out language, VM and library as Sam does, so I do like the .net CLR as a VM mainly because a lot of thought about the kinds of languages planned to run on it was done before implementagtion; this means adding Python (IronPython), Ruby (IronRuby) and others has been interesting to watch. As for languages I think many of my preferences are known, but as a collector of programming languages I am probably not an objective source.

One comment on Sam's apology mentioned Stackless Python, something I was planning to write a posting on myself some time. The difference I see is that stackless is an enabling feature of the VM, that we will still need the language-level primitives for message send/receieve I see in Erlang and then library support for managing local and remote distribution.



Categories : [   Erlang  ]

Aug 17 2007, 08:35:40 AM EDT Permalink



Monday August 06, 2007

Beautiful Code, Safe Code

I have a copy of the new book Beautiful Code: Leading Programmers Explain How They Think (you can also check out the O'Reilly Beautiful Code Home. My concern is that Beauty, depending on how you define it in this context, does not seem to me to be the way to measue or judge code. Now, some people seem to define beauty in terms of the readability of code and that is important for those that follow in your footsteps. Some define it in terms of the simplicity and compactness of an algorithm and implementation and again that seems valuable in that a smaller implementation tends to be more understandable (fills fewer pages in the brain). But those who become enamoured with the elegance, symmetry or "beauty" of code we should remember the words of Donald Norman "".

My personal preferrence is to see well-laid out code, reasability, simplicity and openess as great tools in the service of safe code. I would be much happier to judge the value of my code on what the test team think of it rather than the adulation of other programmers (even though that is nice). Code that doesn't come back to haunt you, that's beautiful. So what else can we include in the list of tools for developing safe code? Well Bryan Cantrill discusses the book here but more interestingly here where he argues that programming language choice plays a part in beautiful (and by my extension, safe) code. This an area which tends to bring about some heated, even passionate, discussion but I believe that language choice really does make a difference in both the ease with which concise and clear code can be written as well as the ability to develop safe code.

To this end one area where I think many programmers struggle is the development of parallel code; and with the widespread availability of multi-core machines (it's hard to by a PC these days which isn't a Duo) it's a skill more of us will need to know when our jobs include the performance of applications. This is certainly part of the discussion in a new book on the language Erlang - a language which includes simple, compact and elegant parallel primitives. I spent some time working in Ada which has a good set of parallel abstractions and while Ada has many problems it is interesting that few of the popular languages today provide much in the way of parallel primitives beyond Thread classes and synchronized keywords. I'm not sure that Erlang is going to be any more successful than Ada outside of it's current niche but it is now a fully open sourced project and does seem to be generating quite a bit of buzz. The nice thing about Erlang is that it combines a god functional language, single-assignment and a high-level set of parallel primitives in an elegant (dare I say beautiful?) manner. Whether Erlang does take off or not I do think that we'll have to work out a way to keep our code beautiful when it is split into numerous components running parallel across different cores, processors, blades or machines.

Just for kicks, here's a nice piece from Jonathan Edwards in his post Beautiful Code.

Another lesson I have learned is to distrust beauty. It seems that infatuation with a design inevitably leads to heartbreak, as overlooked ugly realities intrude. Love is blind, but computers aren’t. A long term relationship – maintaining a system for years – teaches one to appreciate more domestic virtues, such as straightforwardness and conventionality. Beauty is an idealistic fantasy: what really matters is the quality of the never ending conversation between programmer and code, as each learns from and adapts to the other. Beauty is not a sufficient basis for a happy marriage.


Categories : [   Erlang  ]

Aug 06 2007, 08:09:02 PM EDT Permalink



Wednesday August 01, 2007

Cool Django

Django is cool - and to be really clear if you think I mean Django Reinhardt then yes I agree he is very cool, or perhaps you think I mean Pearl Django and yes they are pretty darn cool too, but if you thought instantly of the Python "The Web framework for perfectionists with deadlines" then we're on the same page (though that means we probably both need a life).

As part of the team here we tend to develop prototypes to prove out certain technical risks and right now my favorite platform for these throw-away projects has become Django, although for some more control over low-level details Twisted is great, but a bit more work. For web applications Django has so much in the box it's very easy and remarkably quick to get going - however what we were trying to do was a little different and so one of the things we had to do was add a few pieces to the Django framework itself, which turned out to also be a lot less work than we thought. Specifically we had need of two new capabilities not included in the current Django (0.96):

  • A Database field to store UUID/GUID values and also support the 'auto' property so that such a field can be used as an auto-generated primary key value.
  • A Database field to store regular expressions and while these are just strings we would like to have a form validator that ensures that the text you enter is a valid regular expression.

The first was easy, we simply subclassed the standard Django CharField model field class, fixed it's length at 36 characters and used the uuid module to generate a value if the 'auto' property is set. Note that the uuid module is included in Python 2.5 but not 2.4 or previous so you'll need to download it from Ka-Ping Yee. We also ensured that if 'auto' is set then any such property is not editable in the Django admin UI - this logic was taken from the current implementation of auto properties in Django itself. The code below shows the content of a module used in a number of places in the project and specifically the class UuidField is used by our model classes.

The second was also relatively easy, though it took a little longer to find some code to crib from but the result is also shown below in the isValidRegularExpression function. The approach is pretty simple (simplistic?) and involves passing the field value through the regular expression compile function and if that throws an exception assume that the value is not a legal expression. This seems to work pretty well, certainly well enough for our purposes anyway.

import uuid

from django.db.models.fields import CharField

class UuidField(CharField):
    """ A field which stores a UUID value, this may also have the Boolean
        attribute 'auto' which will set the value on initial save to a new
        UUID value (calculated using the UUID1 method). Note that while all
        UUIDs are expected to be unique we enforce this with a DB constraint.
    """
    def __init__(self, verbose_name=None, name=None, auto=False, **kwargs):
        self.auto = auto
        # Set this as a fixed value, we store UUIDs in text.
        kwargs['maxlength'] = 36
        if auto:
            # Do not let the user edit UUIDs if they are auto-assigned.
            kwargs['editable'] = False
            kwargs['blank'] = True
        CharField.__init__(self, verbose_name, name, **kwargs)

    def get_internal_type(self):
        """ see CharField.get_internal_type
            Need to override this, or the type mapping for table creation fails.
        """
        return CharField.__name__

    def pre_save(self, model_instance, add):
        """ see CharField.pre_save
            This is used to ensure that we auto-set values if required.
        """
        value = super(UuidField, self).pre_save(model_instance, add)
        if (not value) and self.auto:
            # Assign a new value for this attribute if required.
            value = str(uuid.uuid1())
            setattr(model_instance, self.attname, value)
        return value

import re
from django.core import validators
        
def isValidRegularExpression(field_data, all_data):
    """ A standard validator function that ensures that the user enters a
        valid regular expression in a form field.
    """
    try:
        re.compile(field_data)
    except:
        raise validators.ValidationError, 'Error compiling regular expression %s' % field_data

isValidRegularExpression.always_test = True

There are a few more Django tweaks as well as some tips/tricks we found that hopefully I can post over the next week or so.



Categories : [   Django  |  Python  ]

Aug 01 2007, 10:56:51 AM EDT Permalink



Monday July 02, 2007

Python and XPath (as an) API

Python has a decent set of libraries for XML processing, SAX, DOM and ElementTree but unfortunately some of the more advanced processing tends to be supported only in add-on packages such as PyXML or 4suite. One particular area that I needed in the last few weeks was a reasonably complete XPath implementation and I was already using the standard library DOM for manipulating documents and so I'd rather not change to either PyXML or 4suite. Then I got to thinking, there is another issue lurking here, the same issue that 3GL programmers have faced with SQL -- the mixing of two languages in an application. Specifically my logic is written in Python withe the syntax and semantics of the language front-and-center; however when I want to query my XML resources I have to use an alternative language and I cannot code in that language rather I have to construct an expression in a string and submit it to a single "evaluate" method.

The question really was, can I use the data/programming model of XPath directly from within my Python code? Seems pretty simple enough and I started to sketch out what this would look like taking some simple but reasonable example queries. The result was a very simple xpath library which has two classes, XPathNode and XPathSequence that both contain methods corresponding to the Axis navigation and standard functions defined by XPath, as an example the mapping from XPath to the methods on XPathNode/XPathSequence is shown in the following table. This allows the programmer to not only "think" in XPath but to not have to context-switch between languages and also to make use of editors with syntax editing, command completion and so forth for their XPath code.

XPath AxisAbbreviationPython
/ancestor::* ancestor('*')
/ancestor-or-self::* ancestor_or_self('*')
/attribute::*@*attribute('*')
/child*/*child('*')
/descendent::* descendent('*')
/descendent-or-self::*//*descendent_or_self('*')
/following::* following('*')
/namespace::* namespace('*')
/parent::*/..parent('*')
/preceding::* preceding_sibling('*')
/self::*/.self('*')

In the same way the functions root, element, node, comment and text as well as the features name, local-name and namespace-uri are all present as methods on XPathNode.

This results in a relatively consistent API that can be used instead of having to construct a string represent XPath in your Python code, evaluating it and parsing the results. For example, consider the following example code.

from xpath import minidom
from xpath.predicates import *

# XPath was "/Stakeholder//tag/parent::*"
_doc = minidom.doc("file.xml")
sequence = _doc. \
               child("Stakeholder"). \
               descendent_or_self("tag"). \
               parent("*")
for node in sequence:
    print node

This seems pretty simple and while the API is not complete yet and has some issues (asking for an attribute or element named "*:something" or "this|that" doesn't work yet) I have also started on a simple parser to convert an XPath string into a set of commands that can be executed - compile an XPath string so that it can be used over and over. One key to making this work efficiently is the use of Python partial functions - these are used extensively inside the code as well as in the API for dealing with predicates. The following example illustrates this.

from xpath import minidom
from xpath.predicates import *

# XPath was "//organization:Org/[@manager="me"]"
_doc = minidom.doc("file.xml")
attribute = partial(predicates.attribute, 
                    filter=predicates.Filter(match_name='manager', 
                                             feature=_doc.node_type.value,
                                             test_value='me'))
list = _doc.descendent_or_self('organization:Org').evaluate(attribute)

The evaluate() method on XPathNode and XPathSequence takes a partial function (a Python callable that has had some of it's parameters already bound) which is then invoked within the evaluate() method with the context node as an additional parameter.

Hopefully I will get a chance over the next few weeks to clean up and post the code - at least get it to pass a decent set of unit tests.



Categories : [   Python  |  XML  ]

Jul 02 2007, 03:03:19 PM EDT Permalink



Tuesday May 15, 2007

Eric the sheep

Here's a little fun aside for you, a problem I sat down with three adults a 10 year old and a 7 year old. The problem concerns a happy chap by the name of Eric the Sheep, which I sill summarize below (visit the site though there are some really interesting problems for kids).

Eric the sheep is lining up to be shorn before the hot summer ahead. There are fifty [50] sheep in front of him. Eric can't be bothered waiting in the queue properly, so he decides to sneak towards the front. Every time Eric passes two sheep, one sheep from the front of the line is taken in to be shorn. How many sheep will be shorn before Eric?

Well once we were close to the answer, and being a geek at heart, out came the ThinkPad and Python for a quick solution check ... so enjoy this.



Categories : [   Education  |  Python  ]

May 15 2007, 10:49:42 AM EDT Permalink

Previous month
  May 2008
S M T W T F S
    123
45678
9
10
11121314151617
18192021222324
25262728293031
       
Today

RSS for

RSS for

Favorites
Business Process Trends
Eclipse UML2
MIT Process Handbook
RosettaNet
Terry Pratchett Books

Categories
Book (1)
Chandler (1)
Django (1)
Education (1)
Erlang (2)
Java (1)
Jazz (1)
MDD (1)
Python (7)
RDF (1)
RUP (2)
SOA (3)
XML (2)
jazz (8)
mac (1)
python (2)
rest (3)

Recent Entries
RDF - the good, the bad or the u...
New entry on the Jazz blog
JRS and JCR (JSR-170) again
JRS server on Mac OS
Jazz is for everyone
"J" is for Jazz, not Java
JRS and JCR (JSR-170)
Develop with Jazz, for Jazz, and...
Jazz REST Services
64-core ThinkPad anyone?
Erlang on IBM blogs
Beautiful Code, Safe Code
Cool Django
Python and XPath (as an) API
Eric the sheep

Blogs I read
Adrian Colyer
Ed Brill
Grady Booch
Guido van Rossum
Keith Short
Martin Fowler
Miguel de Icaza
Pat Helland's WebLog
Stuart Kent

Special offers
Save on Rational testing software
Download trial versions of popular IBM software
Register for the DB2 Information Management Technical Conference

More offers


 
    About IBM Privacy Contact