Skip to main content

Thoughts from OSCON 2009: Open government, concurrency

Bringing open data and standards to governing while putting many cores and nodes to good use

David Mertz, Ph.D. (mertz@gnosis.cx), Developer, Gnosis Software, Inc.
David Mertz
David Mertz has been writing the developerWorks columns Charming Python and XML Matters since 2000. Check out his book Text Processing in Python. For more on David, see his personal Web page.

Summary:  The O'Reilly Open Source Convention (OSCON) is nothing if not a place to collect one's deeper thoughts around software development. The 2009 convention offered more than its share of inspiration, but David Mertz whittled the schedule down and focused on two very different but important areas: open government and concurrency.

Date:  12 Aug 2009
Level:  Introductory
Activity:  4826 views
Comments:  

An important OSCON focus this year was open government, including the cooperation possible between open source developers and a more transparent government. I also attended sessions on improving the reliability of computer-based voting.

Tim O'Reilly's keynote

Keynote sessions at OSCON often do a good job setting the tone of the conference sessions. Tim O'Reilly, who certainly must have followed the convention planning, made some comments that spoke to the range of concerns of the next days and advanced his own current areas of interest. Like many in the open source movement, O'Reilly has been concentrating much of his recent efforts working with government initiatives and with open source developers hoping to get more out of government information resources.

Before launching into his work with what he called "Gov 2.0," O'Reilly touched on some other interesting issues of "federation" of data services and social networks. Foreshadowing a widely expressed sentiment, O'Reilly discussed the move to smaller devices—mainly smart phones—that both employ lots of sensors (touch screen, camera, microphone, motion, GPS) and utilize cloud data services for functions such as speech recognition, search, location, face recognition, and integration into social networks. The thing about the cloud is that the more cooperation that exists between data services, the more valuable each one is; the downside is that it creates a tendency towards monopoly.

Back to government, though. Now, in the current operation of government, information of all sorts is published much more openly and there has been a big move to an "Open Government Platform" in which data services provide SDKs and open APIs for citizens (and open source developers) to utilize. O'Reilly spoke particularly highly of the new Federal Chief Technology Officer, Aneesh Chopra, in this regard.

According to O'Reilly, the entity Gov 2.0 has several features. It emphasizes transparency of social networks and allows citizen contribution and collaboration. It treats government as a platform and a means of collective action, but it's about Benjamin Franklin's concept of citizen action, not what Donald Kent called "vending machine government." Open source developers bring an attitude to this process of "give us tools, and we can do it ourselves," which precisely characterizes an engaged citizenry.

There are, however, still obstacles in the move to Gov 2.0, the main problem being the byzantine procurement process that remains. O'Reilly offered the example of Brewster Kahle and archive.org offering to mirror the historical state of White House Web sites for minimal cost, but the existing procurement process makes it far more difficult to spend tens of thousands for an agile project than to spend millions (or billions) on inflexible and overdesigned (and closed source) ones. So now White House workers take fragile and time-consuming manual snapshots of sites, which is both inefficient and far more expensive than the right approach.


Open source for elections

I listened to two talks that spoke directly to using open source for public elections. Long-time readers have seen my prior articles discussing open source for elections and my work with the Open Voting Consortium (OVC). In fact, I've presented technical results of OVC's demo/prototype systems at previous OSCONs.

One of the talks this year was by Neal McBurnett, who developed a system using Django for Election Verification for the city of Boulder, Colorado. Aside from the technical issues that he discussed, the wonderful and remarkable thing is that he managed to move the county to use these systems in 2008, thereby improving the reliability and transparency of those elections.

McBurnett's talk did not delve too deep into his specific code, but more interestingly looked at design principles for election auditing and more generally for elections systems. One of the most important principles he mentioned was Rivest and Wack's notion of "software independence" for elections. In short, this is the idea that the dependence on software is non-essential and importantly, each component is replaceable and operates on well-defined interfaces. Paper ballots and open formats are the main elements of this software independence.

Most of McBurnett's talk concerned the best principles for statistical verification of ballots and tallies. There are surprisingly subtle details of sampling and a difference between fixed-percentage and risk-limiting audits. Those details are better addressed in his slides, linked to in the Resources section.

The second talk, closely related to this, was by James Tillman and Richard Benham, who presented a system they had developed in Florida to conduct elections correctly. At this point, Tillman and Benham, who have worked with prominent elections activist Ion Sancho, have a proof-of-concept, a good system that has not yet been used in public elections (nor has the source code yet been released, though that was promised as coming soon).

As a number of independent projects have, Tillman and Benham came to the same conclusions as the OVC did in 2003. The best voting system is one that uses computer devices, probably touch screen interfaces, to produce voter-verifiable paper ballots. The advantages of using computers at all are:

  • Enabling accessibility to voters with disabilities
  • Providing multi-lingual capabilities
  • Reducing ballot spoilage through overvoting or undervoting

In the end, the computer should just be a fancy printer. As with OVC's design, Tillman and Benham decided to use a 2D barcode as part of their printed ballots. OVC's code stack uses Python, PyGame, and Postscript to produce this result; Tillman and Benham instead use Java™ code with most of the work being done directly with XML technologies such as EML, XSLT, and XSL-FO. In both cases, an essential element of the design is the use of open source (and therefore voter-inspectable, in principle) tool stacks.

Although it did not focus centrally on elections, a session called "Hacking the Open Government" was very informative as well. This session featured Adina Levin, Silona Bonewald, Kevin Marks, Ilan Rabinovitch, and California Secretary of State Debra Bowen. Secretary Bowen's star appeal drew me to this one; it is heartening to see a prominent elected official who "gets" openness, and I was happy to have the opportunity to speak with Secretary Bowen briefly in the question period after the session.

The second aspect of the convention I paid special attention to was concurrency. As Moore's law has generally ended, remaining improvements in processor performance focus on more cores rather than greater clock speeds, so distributing work among cores and chips (and nodes and networks) becomes increasingly central to good theoretical application design. Plus, good distribution deserves good tools to implement and maintain it.


REvolution Computing's ParallelR

At the invitation of the company's press representative, I had a very nice talk with REvolution Computing's CEO, Richard Schultz. The company, following the model of many open source companies, has created an enhanced enterprise version that adds various features to the basic REvolution R product, with a pricing structure that allows the enterprise version to make sense for commercial application. I'm not specifically interested in REvolution Computing's marketing, though, but rather in the general product they have created.

In both versions, REvolution Computing has boosted the R programming language to greatly enhance performance, and the company has added domain-specific libraries for both computation and graphing. The performance improvements are of two sorts. On the one hand, they have simply improved algorithms and optimized the speed of linear computations. According to Schultz, they have managed 2-5x improvements as a rule. That is impressive, but of even more interest to me is how they have approached distributing work over heterogeneous resources, from cores to node clusters.

ParallelR is based on an approach to concurrency developed for the Linda Coordination Language, based on allocation of segments of work from a centrally managed tuple-space. R itself is well suited to such parallelization, being largely functionally based. ParallelR adds packages foreach and iterator, which provide high-level constructs to parallelized computation, the allocation being handled transparently from a programmer's point of view. For the large-scale computation often performed using R, this high-level concurrency is a big win.

A caveat that is almost automatic in discussion of heterogeneous concurrency is that it works best for problems that are "embarrassingly parallel." Fortunately, elementwise operations on large data sets are precisely this kind of problem and largely the type of problem R users usually face.


A survey of constructs

Ted Leung is always an enjoyable speaker—he's someone who has done a lot of work for the dynamic programming language communities, especially around Python. His talk was a detailed overview of multiple approaches to concurrency; some of it will be familiar to most readers, but some is relatively esoteric for mainstream computer science courses and practical programmers. Leung enumerated a half-dozen or so models or paradigms for concurrency.

The model that is probably most familiar to most programmers is threading. The problems in threading are almost equally familiar: lock contention and deadlocks. More generally, parallelization with locks always requires a lot of explicit planning by developers. Other approaches have their own drawbacks but mostly allow higher level abstraction away from explicitly locking resources.

One approach to concurrency is software transactional memory, which has been popularized (somewhat) by implementation in Haskell. In essence, this style treats memory similarly to transactional database systems in which operations maintain atomicity, with rollback and retry in cases of conflicting requests. Transactional memory can lead to duplication of computation, but in the frequent case where conflicting actions are uncommon, the overhead can still be less than that of locking.

Another model is that of Actors, as especially well known in Erlang (though Leung was careful to observe that the heritage of the Actor model and of Erlang are independent). The Actor model is basically a "shared nothing" architecture in which Actors always pass self-contained messages to request work from other Actors. I wrote about a presentation on Actors during last year's OSCON.

Leung discussed, though briefly as time was running out, the tuple-space model underlying Linda and also ParallelR. In this model, tasks are uncoupled in space and time, and Readers are decoupled from Writers.

Leung also mentioned in passing Bill Ackerman's notion of "dataflow," which allows declarative concurrency. The problem with this one is that while it inherently partitions problems, it cannot deal with non-determinism (which is to say, it is meant for pure functional languages); once you need to deal with real-world indeterminacy (say, input and output), bolted-on equivalents of Actors become needed.

A last model Leung discussed is persistent data structures. This is an incomplete model; it does not deal with all the issues in concurrency. It does a good measure, though. This model has long roots but has been popularized in the last year or two by its use in the language Clojure. The basic idea here is that all data structures are preserved (that is, immutable), but copies representing various modifications are made. Once you have immutability, all the issues of locking go away. Not every data structure is a persistent version, but the large majority of widely used ones have variants that preserve big-O efficiency of their mutable counterparts (see Resources for a link to more information on big-O notation).


Clojure provides functional concurrency for the JVM

Howard Lewis Ship introduced the language Clojure in a general way in his presentation. I have been hearing quite a lot of buzz around this new language for the JVM. Of course, if you have heard anything about it, you know that Clojure is really a very old language, known as Lisp (albeit with some improvements to built-in datatypes and minor syntax additions).

I have been excited to find an opportunity to do some concrete development in Clojure, but haven't quite found the chance yet. There are a number of nice features of Clojure, which Ship outlined. Being built atop the JVM, Clojure is an equal citizen with other JVM languages and in particular has access to all the standard libraries that come with the Java language (or for that matter, any classes you might happen to have already).

From a concurrency perspective, Clojure does a number of things right or at least in a plausibly right way. It introduces an extremely broad set of persistent data structures to avoid contention issues. Even in the narrow areas where conflicts can arise among operations, Clojure uses software transactional memory to manage concurrency. Moreover, an Agent framework is also built on top of Clojure, if you want to go that route. My understanding is that the inherent capability only allows Agents among threads on the same JVM, but I am certain someone has built or will build larger scale multinode agent systems (heck, just use one of the several Agent systems that exist already for Java code).


Resources

Learn

Get products and technologies

  • With IBM trial software, available for download directly from developerWorks, build your next development project on Linux.

Discuss

  • Get involved in the My developerWorks community; with your personal profile and custom home page, you can tailor developerWorks to your interests and interact with other developerWorks users.

About the author

David Mertz

David Mertz has been writing the developerWorks columns Charming Python and XML Matters since 2000. Check out his book Text Processing in Python. For more on David, see his personal Web page.

Comments



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux, Open source
ArticleID=420447
ArticleTitle=Thoughts from OSCON 2009: Open government, concurrency
publish-date=08122009
author1-email=mertz@gnosis.cx
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers