For the last 11 months, I been involved in an organization called the Open Voting Consortium (OVC) and an associated Free Software project called EVM2003. The OVC's aim is to replace the closed-source electronic voting machines from proprietary vendors, specifically those that fail to provide a voter-verifiable paper ballot. The initial focus is elections in the U.S., but OVC systems hope to prove useful to other nations in the future. As developerWorks readers well know, software is inevitably prone to errors and to malicious tampering; it is hard to find "just trust us" to be a satisfying answer to the risk of votes being directly misrecorded onto electronic-only storage devices.
The EVM2003 project, hosted on SourceForge, is a worldwide community effort to build a reference implementation of elections software that meets a set of specifications emerging from the OVC; that organization hopes to act as a kind of standards body for elections systems. EVM2003 is programmed in Python -- with various supporting libraries -- and uses XML for almost all of its data storage/configuration requirements.
The OVC's specifications are quite detailed, and many of those details are still evolving. Issues like the security analysis and full threat model are outside the scope of this article, but are being considered carefully by many of the world’s leaders in computer security and elections technology (with yours truly contributing to this aspect). But it is worth quickly outlining the envisioned system before delving into the XML formats and source code that might support it. The code and formats I present in this installment are preliminary attempts at both; while the details are likely to change, I hope that much of the design will carry forward to voting systems that are eventually used by U.S. voters, or by those in other nations.
Currently, the OVC has publicly demonstrated its reference design; we are in the process of working with funding agencies, such as the National Science Foundation, and state universities, as well as regulatory bodies and state or federal legislatures.
An OVC-compliant polling place would contain several types of computer stations. Voting stations would come with both a GUI/touchscreen interface and an interface that allows reading-impaired voters (such as blind people) to vote using headphones and key presses. Either type of station would print out an official ballot after a voter completes his or her selections. A sighted and literate voter would read the face of this ballot to make sure that it records her intended votes; a reading-impaired voter might, as an alternative, take the ballot to an independent, non-networked ballot vocalization station that would read back recorded votes over headphones. Once verified, the paper ballots would be placed in an ordinary locked ballot box. At the end of the day, poll workers would reconcile the official paper ballots with the anonymous electronic records from the EVMs -- accounting for spoiled ballots, test ballots, and so on -- to provide extra checks against tampering. At higher levels -- such as county or state -- precincts would be aggregated (called "canvassing" in elections nomenclature), and various other integrity checks would be performed.
Here's how the EVM2003 system works. EVM2003 uses or produces three closely related types of XML files:
- ballot file
- electronic ballot image (EBI) file
- reconstructed ballot image (REBI) file
At the initial stage, an election at a given place/precinct and date -- and
perhaps for a specific party ballot -- is data driven by a ballot file.
In production, this file would be uniquely
named to indicate place, date, party (if applicable), and natural
language (where multilingual ballots are provided). For example, a
production ballot XML data file might have a name like
election-20041102-US-MA-Franklin-2390-Dem-EN.xml or
election-20041102-US-MA-Hampshire-3451-Rep-ES.xml.
This exact naming convention is merely my suggestion, but it indicates the type of
information that I would expect to find (all of the identifying information
is also contained in the content of the data files, so the XML might
be stored in, for example, an RDBMS rather than on a file system).
After a voter has voted, an electronic ballot image (EBI) is stored as an XML file on the voting station; the collection of stored EBIs is written to removable media, such as a CD-R, in randomized order to preserve voter anonymity. The information contained in an EBI should exactly match the information represented on an official printed ballot, though a ballot would contain various formatting such as font choices, placement, rules, and boxes to make verification visually easy for voters.
During the reconciliation process, poll workers scan the paper ballots to produce reconstructed ballot images (REBIs). These REBIs might not be byte-wise identical to their corresponding EBIs, even if no error occurs. For example, in the demo system (and perhaps for production), we decided to represent votes as barcodes to make scanning simpler; however, our barcode only records the fact of a write-in vote, not the write-in name itself. Jurisdictions differ greatly as to when and whether write-in names will be counted, so this is a fine way to establish boundary conditions for requiring visual examination of write-in names on ballots.
EBIs and REBIs have a special, and rather elegant, relation to ballot
files. An EBI or REBI is almost a proper subset of a ballot file --
that is, you create an EBI simply by removing non-selected choices
from a ballot file and renaming the root element from <ballot> to
<cast_ballot>.
One effect of this relation is that a <cast_ballot> XML file
is a simpler version of a ballot XML file.
The subset relation I state is violated in a couple minor ways within the
current CVS snapshot of EVM2003, but it could easily be enforced with
minimal modification. For example, in ballot-election.xml,
<contest> elements have an
allow_writein element that is
superfluous to, and not included in, EBIs or REBIs. And symmetrically the
root element <cast_ballot> contains an attribute
source to
distinguish EBIs from REBIs (for example, voting_machine versus
ballot_scan).
If the proper subset relation is enforced in production systems, it provides the somewhat nice property that ballot files and EBIs conform to an identical DTD/schema, providing a basic consistency guarantee. Even if we decide that precise subsetting is unnecessary, the close similarity of EBIs and ballot XML files makes debugging and development easier on developers (and on our frail memories).
Some example XML files will help you get a sense of the formats I have described. The official DTDs or schemas (ideally using RELAX NG in production) are still sufficiently subject to change that I do not want to fixate on those. To save space, I'll simplify the demonstration election that the OVC has used, but keep at least one example of each contest type. First, the ballot file:
Listing 1. ballot-election.xml
<ballot election_date="2008-11-04" country="US" state="CA"
county="Santa Clara County" precinct="2216">
<contest ordered="No" coupled="Yes"
allow_writein="Yes" name="Presidency">
<selection party="Reform"
name="President">Martin Luther King</selection>
<selection party="Reform"
name="Vice President">John Anderson</selection>
<selection party="Workers"
name="President">Helen Keller</selection>
<selection party="Workers"
name="Vice President">Amelia Earhart</selection>
<selection party="Socialist"
name="President">V. I. Lenin</selection>
<selection party="Socialist"
name="Vice President">Karl Marx</selection>
</contest>
<contest ordered="No" coupled="No"
allow_writein="Yes" name="Senator">
<selection party="Green">Frances E. Willard</selection>
<selection party="Libertarian">Lucy Stone</selection>
<selection party="Democrat">Karl Menninger</selection>
<selection party="Labor">William Lloyd Garrison</selection>
</contest>
<contest ordered="No" coupled="No" allow_writein="No"
name="Transportation Initiative">
<summary>
<scope>Constitutional Amendment Article X, Section 19</scope>
<type>initiative</type>
<title>California Transportation Initiative For Statewide Solar
Powered Magnetic Levitation System</title>
</summary>
<description>
To reduce traffic and increase travel alternatives, this amendment
provides for development of a high-speed solar powered magnetic
levitation system to run along Interstate 5. Construction to begin
November 1, 2004.
</description>
<selection>Yes</selection>
<selection>No</selection>
</contest>
<contest ordered="Yes" coupled="No" allow_writein="Yes"
name="County Commissioner">
<selection>William Hewlett</selection>
<selection>Steve Wozniak</selection>
<selection>William Shockely</selection>
<selection>Gordon Moore</selection>
<selection>Philip Kahn</selection>
</contest>
</ballot>
|
Some of the attributes of <contest> merit further explanation.
Contests may be ordered to allow ranked preference votes.
Ranked preference votes allow a voter to assign a preference (first, second, third,
and so on) among a number of candidates -- you might use many different
methods to score those ranks (all outside the scope of this
article), but an Instant Runoff is probably the least rare in the U.S.
In any case, since a few jurisdictions use ranked preference voting,
this ballot XML format needs to accommodate it. Missing from the current
ballot DTD is an attribute to indicate the maximum number of ranks
that are assignable -- I will show you how to add this shortly.
Contests can either allow write-in votes or not. In the example, it
makes little sense to write in a vote on a Yes/No initiative; other
types of contests may or may not allow write-ins, depending on
jurisdictional rules. The name attribute is straightforward -- it
simply describes what a contest is in a short phrase. In some
contests, child elements may be needed to present the details of a
contest to voters (such as initiative descriptions). Again, the exact
content is jurisdiction dependent.
The coupled attribute is probably a hack, but it's
not clear whether there's a better approach. In U.S. presidential elections, we have a
relatively unusual system in which votes for a President and Vice President must
be paired together -- no a la carte menu selection of "one of each" is allowed, even
though both candidate names still appear. For now, this design lists
each candidate within a <selection> element, but if
coupled is set
to Yes, candidates are presented two-at-a-time rather than
individually.
As I mentioned, a cast ballot (EBI) roughly subsets a raw ballot, but
a few details are changed too. EBIs follow a naming convention
that indicates both the place and date of the election, and also a
distinct ballot ID number (which is not reused on a single machine):
Listing 2. v-20081104-US-CA-Santa_Clara_County-2216-1274.xml
<cast_ballot election_date="2008-11-04" country="US" state="CA"
county="Santa Clara County" precinct="2216"
number="1274" serial="213" source="voting_machine">
<contest ordered="No" coupled="Yes" name="Presidency">
<selection writein="No" name="President">V. I. Lenin</selection>
<selection writein="No" name="Vice President">Karl Marx</selection>
</contest>
<contest ordered="No" coupled="No" name="Senator">
<selection writein="No">William Lloyd Garrison</selection>
</contest>
<contest ordered="No" coupled="No" name="Transportation Initiative">
<selection writein="No">Yes</selection>
</contest>
<contest ordered="Yes" coupled="No" name="County Commissioner">
<selection writein="Yes">David Packard</selection>
<selection writein="No">Gordon Moore</selection>
<selection writein="No">William Hewlett</selection>
</contest>
</cast_ballot>
|
Notice that the attribute writein now appears within
<selection> tags.
In a sense, you could deduce whether a vote is a write-in by
looking at a corresponding ballot-election.xml file that contains
<selection> PCDATA content,
but using the attribute adds some useful
redundancy. If a non-write-in candidate fails to match the raw
ballot, that is a sure sign that you have a data integrity problem. But
some jurisdictions only count specific write-in names if certain
thresholds are met (contest margin-of-victory, total write-ins, and others);
you may never need to look at the content of selections indicated as
writein="Yes".
I have mentioned that the programming for EVM2003 was done in Python;
in addition, the XML access is performed using my Gnosis Utilities
library, specifically gnosis.xml.objectify. Using this library makes
operations on ballot or EBI files particularly painless. For example,
information on contests and candidates is loaded into some Python data
structures with the following code:
Listing 3. ballot-election.xml to Python conversion
from gnosis.xml.objectify import make_instance
ballot = make_instance(xml_data)
contnames, cont = [], {}
for contest in ballot.contest:
name = contest.name
contnames.append(name)
if contest.coupled=="No":
cont[name] = [select.PCDATA for select in contest.selection]
if contest.allow_writein=="Yes":
cont[name].append("")
else:
cont[name] = []
for n in range(0, len(contest.selection), 2):
cont[name].append([s.PCDATA for s in contest.selection[n:n+2]])
if contest.allow_writein=="Yes":
cont[name].append(["",""])
|
The function make_instance() generally reduces thought of the
XML-ness of data formats to a single line; after that, it's just Python.
A special issue comes up in comparing EBIs with each other, or with REBIs (or rather, several related issues). As I mentioned, REBIs are not generally byte-wise identical to their corresponding EBIs because write-in names are not recorded in full on barcodes. But more generally, the OVC intends to set standards for data formats, not simply produce them with specific code. Third-party code should be able to produce and process EBIs -- for example, to confirm that tabulation has been performed accurately.
The document equality question applies to many classes of XML documents: When are two documents identical according to application requirements? Conforming to the same DTD or schema is a minimum necessary condition, and XML canonicalization can remove many trivial syntactic variants. But as a rule, meaningful identity cannot be expressed by schemas alone. For example, deciding when the order of child elements is meaningful and when it is incidental is strictly an application-level issue.
The Gnosis Utilities library provides (in my opinion) a rather elegant way
to customize the meaning of equality. You may define a custom
class with equality and inequality tests to hold all XML documents
with the root element <cast_ballot>. The module
evm2003.utils.equiv injects an application-specific equality test
into EBI Python objects, and may also be used as a command-line tool
to compare EBIs/REBIs. Here it is, including the detailed docstring:
Listing 4. evm2003.utils.equiv.py module
"""Compare ballot XML files for equivalence
.
This file may be imported as a module or used as a command-line ballot
comparison tool. If imported, e.g.:
.
>>> import evm2003.utils.equiv
>>> from gnosis.xml.objectify import make_instance
>>> a = make_instance('scanned.xml')
>>> b = make_instance('stored.xml')
>>> a == b
1
.
At the command-line:
.
% python equiv.py scanned.xml stored.xml
.
(lack of any output means success, in that ultra-terse UNIX-philosophy
way).
.
We implement custom .__eq__() and .__ne__() methods specific to cast
ballots. Injecting such methods is the recommended technique for enhancing
gnosis.xml.objectify objects.
.
The files scanned.xml and stored.xml documents were used to test this.
They differ in several non-significant respects:
(1) the top-level attributes occur in a different order;
(2) non-ordered multi-select contests have selections in a different
order;
(3) Write-in votes have different PCDATA content (for example, nothing for
scanned.xml).
"""
import gnosis.xml.objectify
import sys
class cast_ballot(gnosis.xml.objectify._XO_):
def __eq__(self, other):
metadata = '''election_date country state county
number precinct serial'''.split()
for attr in metadata:
if getattr(self, attr) != getattr(other, attr):
return 0
by_name = lambda a, b: cmp(a.name, b.name)
self.contest.sort(by_name)
other.contest.sort(by_name)
for my, your in zip(self.contest, other.contest):
if my.name != your.name or \
my.ordered != your.ordered or \
my.coupled != your.coupled:
return 0
if my.ordered == "No":
# Compare non-writeins (but don't know if same num writeins)
my_select = dict([(x.PCDATA,None) for x in my.selection
if x.writein=="No"])
your_select = dict([(x.PCDATA,None) for x in your.selection
if x.writein=="No"])
if my_select != your_select:
return 0
continue
for my_select, your_select in zip(my.selection, your.selection):
if (my_select.writein, your_select.writein) == ("Yes","Yes"):
pass
elif my_select.PCDATA != your_select.PCDATA:
return 0
return 1
def __ne__(self, other):
return not self == other
#-- Namespace injection
gnosis.xml.objectify._XO_cast_ballot = cast_ballot
#-- Command-line operation
if __name__=='__main__':
a, b = map(gnosis.xml.objectify.make_instance, sys.argv[1:3])
if a != b:
print sys.argv[1], "and", sys.argv[2], "are NOT equivalent ballots!"
|
I see no need to explain the principles of EBI equivalence in more detail than the docstring gives. The sample code suffices as an illustration of similar considerations that arise in many XML processing applications.
Quite aside from the political import of EVM2003, I feel a certain satisfaction in working with a Free Software project where XML is so clearly just the right data storage format to use. In many contexts, XML is something that you force on yourself because it seems like the way to go -- but in a few cases, the fit is absolutely perfect. In projects that intersect with standards, I think XML has a particularly strong case in its favor since so many interoperable parsers and binding libraries are available (many of which I have written about in this column). And in projects like EVM2003 where the self-documentation of data formats is important (and while data volume is moderate), XML fits like a glove.
- Participate in the discussion forum.
- Read about the EVM2003 project, which
hopes to develop a Free Software infrastructure for elections, and is hosted
by SourceForge.
- Find out more about the Open Voting Consortium at their Web site.
- Download the Gnosis
Utilities as an archive, or browse the package,
including its documentation.
- Learn more about David's Gnosis Utilities in previous
installments of this developerWorks column. The module
gnosis.xml.objectifywas introduced in "On the 'Pythonic' treatment of XML documents as objects(II)" (August 2000). An update of the discussion can be found in "Revisiting xml_pickle and xml_objectify" (June 2001). More recently, the performance and updated usage ofgnosis.xml.objectifywere discussed in "Process XML in Python with ElementTree " (June 2003). - David has coauthored two academic papers on the security and privacy
issues that underlay voting systems, and the Open Voting Consortium
reference design specifically:
- "Privacy Issues in an Electronic Voting Machine" with Arthur M. Keller, Joseph Lorenzo Hall, and Arnold Urkin
- "A PC-Based Open-Source Voting Machine with an Accessible Voter-Verifiable Paper Ballot" with Arthur M. Keller, Alan Dechert, and Karl Auerbach
- Find hundreds more XML resources on the
developerWorks XML zone.
- IBM trial software for product evaluation: Build your next project with trial software available for download directly from developerWorks, including application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
- Find all previous installments of David's XML Matters column on the column summary page.
- Browse for books on these and other technical topics.
- Learn how you can become an IBM Certified Developer in XML and related technologies.

David Mertz feels that procedural democracy requires that the technical instruments of governance be open for public inspection, every bit as much as it requires the legal acts of government remain so open. David may be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/dW/. Suggestions and recommendations on this, past, or future columns are welcomed. Check out David's book Text Processing in Python.
Comments (Undergoing maintenance)





