Readers of my previous XML Matters installment on the use of XML in an open source voting machine will recognize my motivation for investigating the OASIS standard for EML. My direct interest has been further piqued by my recent membership in the still-fledgling IEEE Project 1622 (Voting Systems Electronic Data Interchange -- see Resources). Actually, OASIS' EML covers quite a bit more ground than the Open Voting Consortium's narrow demo system, or even than is anticipated for P-1622.
Specifically, EML is intended to:
- Be rich enough to accommodate governmental elections across many jurisdiction levels, as well as elections with many different kinds of organizations (community or corporate, for example)
- Allow voting over many channels, both traditional voting booths (perhaps electronic) and remote systems like Web pages, telephone voting, kiosks, and so on
- Enable many tabulation and voting rules, such as ranked preference and cumulative voting
- Handle security, encryption, and authentication requirements
- Record and convey information about voter registration, organization membership, and other voter metadata
EML has seen significant real world use in European government, and in some non-governmental organizations worldwide.
EML, in my opinion, suffers somewhat (but not outrageously) from an over-engineering common among XML technologies (think SOAP, W3C XML Schemas, or even XSLT). Committees have a tendency to produce standards with too many details, handling too many corner cases centrally, and with too many levels of indirection. Of course, having joined another standards committee myself, I suppose I too will soon be guilty of participating in feature creep. Nonetheless, our tentative plan in IEEE P-1622 is to start with a simpler data model provided by a commercial election system vendor (but released on non-proprietary terms), rather than adopt EML whole cloth towards standardization of elections data for the United States. Our target in P-1622 is only to accommodate the needs of governmental elections, rather than every possible voting scenario; moreover, the fifty-some US states and territories have somewhat less procedural variation than do the 45 member nations in the Council of Europe (for example). Nonetheless, the fact that we already have several other contributed data models to reconcile into the final design already makes for a nascent featuritis.
To give you a sense of the scope of EML version 3.0, here's a quote from the Executive Summary to the standard:
The primary deliverable of the committee is the Election Markup Language (EML). This is a set of data and message definitions described as XML schemas. At present EML includes specifications for:
* Candidate Nomination, Response to Nomination and Approved Candidate Lists
* Voter Registration information, including eligible voter lists
* Various communications between voters and election officials, such [as] polling information, election notices, etc.
* Logical Ballot information (races, contests, candidates, etc.)
* Voter Authentication
* Vote Casting and Vote Confirmation
* Election counts and results
* Audit information pertinent to some of the other defined data and interfaces
Many distinct data requirements are addressed by the various aspects of EML. The schemas associated with the logical aspects of an election process are given numeric prefixes to indicate general category. So the 400 series schemas are associated with voting as such; the 500 series with tabulation (also known as canvassing in American terminology); the 100 series with an overall election specification; the 200 series with candidates; the 300 series with voters (eligibility and so forth). Within each schema series, one or more W3C XML Schemas are provided to describe documents that meet those requirements.
Some of the included schemas are:
- 110-electionevent.xsd
- 230-candidatelist.xsd
- 310-voterregistration.xsd
- 340-pollinginformation.xsd
- 410-ballots.xsd
- 420-authentication.xsd
- 440-castvote.xsd
- 510-count.xsd
Information on the naming scheme, along with the schemas themselves, can be found at the OASIS site (see Resources).
In addition to the numbered schema families, EML contains a collection of supporting schemas that mainly deal with common datatypes. For
example, most or all include the schema emlcore.xsd (in some cases
indirectly through some other include). Such a schema will have a line like this:
<xsd:include schemaLocation="emlcore.xsd"/> |
The EML core, in turn, includes emlexternals.xsd and imports
emltimestamp.xsd and the W3C's xmldsig-core-schema.xsd. I have not listed everything that's incorporated, but this illustrates the style. The lines for
including or importing the mentioned schemas are:
Listing 1. External resources used by emlcore.xsd
<xsd:include schemaLocation="emlexternals.xsd"/>
<xsd:import namespace="urn:oasis:names:tc:evs:schema:eml:ts"
schemaLocation="emltimestamp.xsd"/>
<xsd:import namespace="http://www.w3.org/2000/09/xmldsig#"
schemaLocation="xmldsig-core-schema.xsd"/>
|
So far, so good. Now for a closer look. The schema emlexternals.xsd
only defines formats for addresses and personal details about
voting-eligible citizens. But my feeling is that the includes are currently
structured with an eye toward expanding the element and type
definitions within emlexternals.xsd when or if the need arises. In
the main, emlexternals.xsd does its work with yet more includes:
Listing 2. Citizen information datatypes imported to emlexternals.xsd
<xsd:import
namespace="http://www.govtalk.gov.uk/people/AddressAndPersonalDetails"
schemaLocation="AddressTypes-v1.xsd"/>
<xsd:import
namespace="http://www.govtalk.gov.uk/people/AddressAndPersonalDetails"
schemaLocation="PersonalDetailsTypes-v1.xsd"/>
|
Of course, once you follow the path still further into
AddressTypes-v1.xsd, you find still more external definitions -- not
as includes or imports, but through namespaces like those for the Dublin Core
Metadata Initiative.
The schema 410-ballots.xsd specifies the format for an un-cast
ballot. This format is relatively unremarkable, but it is worth
noticing that it includes a number of features that accommodate ballots
in general, not merely governmental elections. For example, I am not
familiar with any governmental elections that provide a "Reason" for
Election/Contest qualification. However, in this case it may be that
a reason (such as "Initiative met signature threshold") is worth
conveying to elections officials, even while not displaying it to
voters.
The schema 440-castvote.xsd
specifies an actual vote made in
response to a ballot. In the Open Voting Consortium (OVC) design that
I presented in an earlier installment, I called these root elements
<ballot> and <cast_ballot> to emphasize their connection. In
contrast to the OVC (preliminary) design, EML does not create any
particular relationship between <Ballots> and <CastVote>. Recall that the OVC design approximately generates
a <cast_ballot> simply by
removing non-supported selections from a <ballot>. For example, if a
<ballot> contains several selections for a <contest name="Mayor">,
a <cast_ballot> is just the same XML fragment with all but one
selection (candidate) removed.
I believe the independent design of schemas within EML leads to certain
pitfalls -- albeit minor ones. For example, in
410-ballots.xsd <Options> may contain either a list of
<Candidate> elements or list of <Option> elements. Fair
enough -- this is helpful in distinguishing political offices from
referenda. But over in 440-castvote.xsd, every vote is listed as an
<Option> and never as a <Candidate>. I see no good reason to distinguish the semantic models of cast and un-cast ballots
in this way (if you want the information in one XML instance, you want
it in the other; if it is superfluous, it is so in both places).
To give you a feel for EML, I decided to prepare a <CastVote> that
matches the <cast_ballot> presented in my earlier installment. I
have condensed the sample document by leaving out optional security
tokens and <AuditInformation>. On the latter, I have some initial
doubts about including the auditing record within the cast vote
itself, since that has the potential to compromise anonymity; but I have
not looked at this matter closely enough to evaluate whether a genuine
security issue exists. However, within IEEE P-1622 -- and within OVC -- I will
probably push to keep audit records as separate documents (which might
be a Federal Election Commission requirement; I'm not giving
legal advice here). Recall that the OVC-format cast ballot looked
like this:
Listing 3. v-20081104-US-CA-Santa_Clara_County-2216-1274.xml
<cast_ballot election_date="2008-11-04" country="US" state="CA"
county="Santa Clara County" precinct="2216"
number="1274" serial="213" source="voting_machine">
<contest ordered="No" coupled="Yes" name="Presidency">
<selection writein="No" name="President">V. I. Lenin</selection>
<selection writein="No" name="Vice President">Karl Marx</selection>
</contest>
<contest ordered="No" coupled="No" name="Senator">
<selection writein="No">William Lloyd Garrison</selection>
</contest>
<contest ordered="No" coupled="No" name="Transportation Initiative">
<selection writein="No">Yes</selection>
</contest>
<contest ordered="Yes" coupled="No" name="County Commissioner">
<selection writein="Yes">David Packard</selection>
<selection writein="No">Gordon Moore</selection>
<selection writein="No">William Hewlett</selection>
</contest>
</cast_ballot>
|
This vote contains the rather unusual case of the US President and Vice President where you cast a common vote for two different candidates running for two different offices. Parliamentary party-slate votes are somewhat similar, conceptually, but in those cases you vote for a single party, not multiple candidates. Other than that, I find this XML minimal and self-explanatory. EML's version tends to nest data more deeply, and does not seem to contemplate the Presidency case directly. As near as I can tell, you might represent this vote as:
Listing 4. EML-20081104-US-CA-Santa_Clara_County-2216-1274.xml
<?xml version="1.0" encoding="UTF-8"?>
<CastVote xmlns="440-castvote.xsd">
<ElectionEvent>
<Event>
<EventName Id="n1274s213">
Santa Clara County, CA, USA (2008-11-04)
</EventName>
<EventQualifier>Precinct 2216</EventQualifier>
</Event>
<Election>
<ElectionName>Presidency</ElectionName>
<Contest>
<ContestName>President</ContestName>
<Selection>
<Option>
<OptionName>V. I. Lenin</OptionName>
</Option>
</Selection>
</Contest>
</Election>
<Election>
<ElectionName>Presidency</ElectionName>
<Contest>
<ContestName>Vice-President</ContestName>
<Selection>
<Option>
<OptionName>Karl Marx</OptionName>
</Option>
</Selection>
</Contest>
</Election>
<Election>
<ElectionName>Senate</ElectionName>
<Contest>
<ContestName>Senator</ContestName>
<Selection>
<Option>
<OptionName>William Lloyd Garrison</OptionName>
</Option>
</Selection>
</Contest>
</Election>
<Election>
<ElectionName>Local Initiative</ElectionName>
<Contest>
<ContestName>Transportation Initiative</ContestName>
<Selection>
<Option>
<OptionName>Yes</OptionName>
</Option>
</Selection>
</Contest>
</Election>
<Election>
<ElectionName>Local Office</ElectionName>
<Contest>
<ContestName>County Commissioner</ContestName>
<Selection>
<Option>
<WriteinOptionName>David Packard</WriteinOptionName>
<Value>1</Value>
</Option>
<Option>
<OptionName>Gordon Moore</OptionName>
<Value>2</Value>
</Option>
<Option>
<OptionName>William Hewlett</OptionName>
<Value>3</Value>
</Option>
</Selection>
</Contest>
</Election>
</ElectionEvent>
</CastVote>
|
I am not entirely certain I have the semantics of <Election>,
<Contest>, <Selection>, and <Option> right, but given the
cardinalities of elements, this seems to be the required arrangement.
Exactly how <ElectionName> and <ContestName> relate is also not wholly clear to me.
I have looked at just a few details of EML version 3 in this installment, but it should be enough to give you a feel for what the system of schemas aims for. In particular, this installment has only really looked at the subset of EML that's concerned with ballots and votes, not all the other portions that deal with voter registration, candidate nomination, or vote canvassing (matching the coverage of my prior related installment).
In Europe, EML is a standard in relatively wide (and growing) usage, and programmers who develop elections systems -- or even systems that touch on them peripherally -- need to become familiar with EML. Moreover, as an OASIS standard, EML is certainly a specification that organizations should consider in conducting private elections. Bringing a common data format to a large swatch of elections usage will allow for interoperability among tools, including tools dedicated to audit and security analysis of elections.
- Participate in the discussion forum.
- Read David's previous XML Matters discussion on "Practical XML
data design and manipulation for voting systems" (developerWorks, June 2004).
- As of this writing, EML Version 3 has been published as a standard (since
February 24, 2003). In consultation with the Council of Europe, OASIS is considering
an EML version 4. The author anticipates that when or if version 4 is drafted and adopted, it will maintain general backward compatibility with version 3 -- obviously, details are
subject to change with any new specification.
- Download the EML schemas from David's Gnosis site.
- Keep an eye on the progress of the Open Voting Consortium.
- Check out the (slightly nascent) demo, known as EVM2003, on SourceForge.net.
- See the homepage for IEEE P-1622, "Voting Systems Electronic Data Interchange". Some of the detailed documentation there has protected access; readers with an interest in following the evolving specification should contact David or the committee chairs to obtain copies of any protected documents
(working drafts can readily be forwarded to anyone who wishes to assist
the committee or its members).
- Visit the homepage for Boynings Consulting, a British company that offers a seminar on EML, and has advised the UK government on its widespread use of EML.
- Visit the e-voting subpage on the the Council of Europe's site on Democratic Institutions.
- Find hundreds more XML resources on the
developerWorks XML technology zone.
- Find all previous installments of David's XML Matters column on the column summary page.
- Browse for books on these and other technical topics.
- Learn how you can become an IBM Certified Developer in XML and related technologies.

To David Mertz, all the world is a stage, and his career is devoted to providing marginal staging instructions. David may be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/dW/. Suggestions and recommendations on this, past, or future columns are welcomed. Check out David's book Text Processing in Python.