When most programmers think about programming, they imagine imperative styles and techniques for writing applications. The most popular general purpose programming languages -- including Python and other object-oriented languages -- are predominantly imperative in style. On the other hand, there are also many programming languages that are declarative in style, including both functional and logic languages, and also including both general purpose and specialized ones.
Let me list a few languages that fall in various categories. Many readers have used many of these tools, without necessarily thinking about the categorical differences among them. Python, C, C++, Java, Perl, Ruby, Smalltalk, Fortran, Basic, xBase are all straightforwardly imperative programming languages. Some of these are object oriented, but that is simply a matter of the organization of code and data, not of the fundamental programming style. In these languages, you command the program to carry out a sequence of instructions: put some data in a variable; fetch the data back out of the variable; loop through a block of instructions until some condition is satisfied; do something if something else is true. One nice thing about all these languages is that it is easy to think about them within familiar temporal metaphors. Ordinary life consists of doing one thing, making a choice, then doing another thing, maybe using some tools along the way. It is easy to imagine the computer that runs a program as a cook, or a bricklayer, or an automobile driver.
Languages like Prolog, Mercury, SQL, XSLT, EBNF grammars, and indeed configuration files of various formats, all declare that something is the case, or that certain constraints apply. The functional languages (such as Haskell, ML, Dylan, Ocaml, Scheme) are similar, but with more of an emphasis on stating internal (functional) relationships between programming objects (recursion, lists, etc.). Our ordinary life, at least in its narrative quality, provides no direct analog for the programming constructs of these languages. For those problems you can naturally describe in these languages, however, declarative descriptions are far more concise, and far less error-prone than are imperative solutions. For example, consider a set of linear equations:
10x + 5y - 7z + 1 = 0 17x + 5y - 10z + 3 = 0 5x - 4y + 3z - 6 = 0
This is a rather elegant shorthand that names several relationships among objects (x, y, and z). You might come across these facts in various ways in real life, but actually "solving for x" with pencil-and-paper is a matter of messy details, prone to error. Writing the steps in Python is probably even worse from a debugging perspective.
Prolog is a language that comes close to logic or mathematics. In it, you simply write statements you know to be true, then ask the application to derive consequences for you. Statements are composed in no particular order (as the linear equations have no order), and you the programmer/user have no real idea what steps are taken to derive results. For example:
/* Adapted from sample at: <http://www.engin.umd.umich.edu/CIS/course.des/cis479/prolog/> This app can answer questions about sisterhood & love, e.g.: # Is alice a sister of harry? ?-sisterof( alice, harry ) # Which of alice' sisters love wine? ?-sisterof( X, alice ), love( X, wine) */ sisterof( X, Y ) :- parents( X, M, F ), female( X ), parents( Y, M, F ). parents( edward, victoria, albert ). parents( harry, victoria, albert ). parents( alice, victoria, albert ). female( alice ). loves( harry, wine ). loves( alice, wine ).
Not quite identical, but similar in spirit is an EBNF (Extended Backus-Naur Form) grammar declaration. You might write some declarations like:
word := alphanums, (wordpunct, alphanums)*, contraction? alphanums := [a-zA-Z0-9]+ wordpunct := [-_] contraction := "'", ("clock"/"d"/"ll"/"m"/"re"/"s"/"t"/"ve")
This is a compact way of stating what a word would look like if you were to encounter one, without actually giving sequential instructions on how to recognize one. A regular expression is similar (and in fact suffices for this particular grammar production).
For yet another declarative example, consider a document type declaration that describes a dialect of valid XML documents:
<!ELEMENT dissertation (chapter+)> <!ELEMENT chapter (title, paragraph+)> <!ELEMENT title (#PCDATA)> <!ELEMENT paragraph (#PCDATA | figure)+> <!ELEMENT figure EMPTY>
As with the other examples, the DTD language does not contain any instructions about what to do to recognize or create a valid XML document. It merely describes what one would be like if it were to exist. There is a subjunctive mood to declarative languages.
Python as interpreter versus Python as environment
Python libraries can utilize declarative languages in one of two fairly distinct ways. Perhaps the more common technique is to parse and process non-Python declarative languages as data. An application or a library can read in an external source (or a string defined internally, but just as a "blob"), then figure out a set of imperative steps to carry out that conform in some way with those external declarations. In essence, these types of libraries are "data-driven" systems; there is a conceptual and category gap between the declarative language and what a Python application does to carry out or utilize its declarations. In fact, quite commonly, libraries to process those identical declarations are also implemented for other programming languages.
All the examples given above fall under this first technique. The
PyLog is a Python implementation
of a Prolog system. It reads a Prolog data file like the sample, then
creates Python objects to model the Prolog declarations. The EBNF
sample uses the particular variant of
SimpleParse, which is a Python library that
transforms these declarations into state tables that can be used by
mx.TextTools is itself an extension library
for Python that uses an underlying C engine to run code stored in Python
data structures, but having little to do with Python per se.
Python is great glue for these tasks, but the languages glued
together are very different from Python. Most Prolog implementations,
furthermore, are written in languages other than Python, as are most EBNF
A DTD is similar to the other examples. If you use a validating
xmlproc, you can utilize a
DTD to verify the dialect of an XML document. But the language of a DTD
is un-Pythonic, and
xmlproc just uses it
as data that needs to be parsed. Moreover, XML validating parsers have
been written in many programming languages. An XSLT transformation is
similar again, it is not Python-specific, and a module like
ft.4xslt just uses Python as glue.
While there is nothing wrong with the above approaches and the tools mentioned above (I use them all the time), it might be more elegant -- and in some ways more expressive -- if Python itself could be the declarative language. If nothing else, libraries that facilitated this would not require programmers to think about two (or more) languages when writing one application. At times it is natural and powerful to lean on Python introspective capabilities to implement "native" declarations.
The magic of introspection
PLY let users declare Python values in
Python, then use some magic to let the Python runtime environment act
as the configuration of parsing. For example, let's look at the
PLY equivalent of the prior
Spark is similar to the example:
tokens = ('ALPHANUMS','WORDPUNCT','CONTRACTION','WHITSPACE') t_ALPHANUMS = r"[a-zA-Z0-0]+" t_WORDPUNCT = r"[-_]" t_CONTRACTION = r"'(clock|d|ll|m|re|s|t|ve)" def t_WHITESPACE(t): r"\s+" t.value = " " return t import lex lex.lex() lex.input(sometext) while 1: t = lex.token() if not t: break
I have written about
PLY in my
forthcoming book Text Processing in Python, and
have written about
Spark in this column
(see Resources for links).
Without going into details of the libraries, what you should notice here
is that it is the Python bindings themselves that configure the parsing
(actually lexing/tokening in this example). The
PLY module just happens to know enough about
the Python environment it is running in to act on these pattern
PLY knows what it
does involves some pretty fancy Python programming. At a first level, an
intermediate programmer will realize that one can probe the contents of
locals() dictionaries. That would be fine if the
declaration style were slightly different. For example, imagine the code
were more like:
import basic_lex as _ _.tokens = ('ALPHANUMS','WORDPUNCT','CONTRACTION') _.ALPHANUMS = r"[a-zA-Z0-0]+" _.WORDPUNCT = r"[-_]" _.CONTRACTION = r"'(clock|d|ll|m|re|s|t|ve)" _.lex()
This style would not be any less declarative, and the
basic_lex module could hypothetically contain
something simple like:
def lex(): for t in tokens: print t, '=', globals()[t]
This would produce:
% python basic_app.py ALPHANUMS = [a-zA-Z0-0]+ WORDPUNCT = [-_] CONTRACTION = '(clock|d|ll|m|re|s|t|ve)
PLY manages to poke into the
namespace of the importing module using stack frame information. For
import sys try: raise RuntimeError except RuntimeError: e,b,t = sys.exc_info() caller_dict = t.tb_frame.f_back.f_globals def lex(): for t in caller_dict['tokens']: print t, '=', caller_dict['t_'+t]
This produces the same output given in the
basic_app.py sample, but with declarations using the
There is more magic than this in the actual
PLY module. We saw that the tokens named with
t_TOKEN can actually be either
strings containing regular expressions, or functions that contain both
regular expression docstrings along with action code. Some type checking
allows polymorphic behavior:
# ...determine caller_dict using RuntimeError... from types import * def lex(): for t in caller_dict['tokens']: t_obj = caller_dict['t_'+t] if type(t_obj) is FunctionType: print t, '=', t_obj.__doc__ else: print t, '=', t_obj
Obviously, the actual
does something more interesting with these declared patterns than the toy
examples, but these demonstrate some techniques involved.
The magic of inheritance
Letting a support library poke around in and manipulate an application's namespace can enable an elegant declarative style. But often, using inheritance structures together with introspection allows an even greater flexibility.
gnosis.xml.validity is a
framework for creating classes that map directly to DTD productions. Any
gnosis.xml.validity class can only
be instantiated with arguments obeying XML dialect validity constraints.
Actually, that is not quite true; the module will also infer the proper
types from simpler arguments when there is only one unambiguous way of
"lifting" the arguments to the correct types.
Since I wrote the
gnosis.xml.validity module, I am biased toward
thinking its purpose is itself interesting. But for this article, I just
want to look at the declarative style in which validity classes are
created. A set of rules/classes matching the prior DTD sample consists
from gnosis.xml.validity import * class figure(EMPTY): pass class _mixedpara(Or): _disjoins = (PCDATA, figure) class paragraph(Some): _type = _mixedpara class title(PCDATA): pass class _paras(Some): _type = paragraph class chapter(Seq): _order = (title, _paras) class dissertation(Some): _type = chapter
You might create instances out of these declarations using:
ch1 = LiftSeq(chapter, ("1st Title","Validity is important")) ch2 = LiftSeq(chapter, ("2nd Title","Declaration is fun")) diss = dissertation([ch1, ch2]) print diss
Notice how closely the classes match the prior DTD. The mapping is basically one-to-one; except it is necessary to use intermediaries for quantification and alternation of nested tags (intermediary names are marked by a leading underscore).
Notice also that these classes, while created using standard Python
syntax, are unusual (and more concise) in having no methods or instance
data. Classes are defined solely to inherit from some framework, where
that framework is narrowed by a single class attribute. For example, a
<chapter> is a sequence of other tags,
<title> followed by one or more
<paragraph> tags. But all we need to do
to assure the constraint is obeyed in the instances is declare the
chapter class in this straightforward manner.
The main "trick" involved in programming parent classes like
gnosis.xml.validity.Seq is to look at the
.__class__ attribute of an instance during
initialization. The class
chapter does not
have its own initialization, so its parent's
__init__() method is called. But the
self passed to the parent
__init__() is an instance of
chapter, and it knows it. To illustrate, this is
part of the implementation of
class Seq(tuple): def __init__(self, inittup): if not hasattr(self.__class__, '_order'): raise NotImplementedError, \ "Child of Abstract Class Seq must specify order" if not isinstance(self._order, tuple): raise ValidityError, "Seq must have tuple as order" self.validate() self._tag = self.__class__.__name__
Once an application programmer tries to create a
chapter instance, the instantiation code checks that
chapter was declared with the required
._order class attribute, and that this attribute is
the needed tuple object. The method
performs some further checks to make sure that the objects the instance
was initialized with belong to the corresponding classes specified in
When to declare
A declarative programming style is almost always a more direct way of stating constraints than is an imperative or procedural one. Of course, not all programming problems are about constraints -- or at least that is not always a natural formulation. But problems of rule-based systems, such as grammars and inference systems, are much easier to manage if they can be described declaratively. Imperative verification of grammaticality quickly turns into spaghetti code, and is difficult to debug. Statements of patterns and rules can remain much simpler.
Of course, at least in Python, the verification or enforcement of
declared rules will always boil down to procedural checks. But the right
place for such procedural checks is well-tested library code. Individual
applications should rely on the simpler declarative interfaces provided by
gnosis.xml.validity. Other libraries like
ft.4xslt also enable declarative styles,
although not declarations in Python (which is appropriate for their
domains, of course).
- Read the previous installments of Charming Python.
- Download the Python implementation of Prolog
- Download the
SimpleParsemodule from SourceForge.
- Read about
Sparkin David's developerWorks articles, "Parsing with the SimpleParse module" and "Parsing with the Spark module".
- Learn more about
gnosis.xml.validityin David's developerWorks article "XML Matters: Enforcing validity with the gnosis.xml.validity library".
- David also discusses both
PLYin a draft of his forthcoming book Text Processing in Python.
- Get a modern look at Extended Backus-Naur Form from O'Reilly's xml.com and a more classical look at the same from The Free On-Line Dictionary of Computing (or Foldoc).
- Forgot all about Prolog? Rediscover its joys and wonders with GNU Prolog (currently at 1.2.16). Never knew Prolog? Read the Foldoc definition first, then head over to download.
- Find all of the languages mentioned in this article and more on the Programming Languages page from Hans-Wolfgang Loidl at the University of Munich.
- Meet xmlproc: a free validating xml parser written in Python. You'll find more xml parsers -- both the validating and the non-validating kind -- at the Web Developer's virtual library.
- Also by David on developerWorks, read:
- Find more resources for Python developers in the developerWorks Linux zone.