Before you start
This tutorial is for developers interested in collecting and publishing documents based on a standardized XML format. In this case, we use the Text Encoding Initiative's TEI P5, a format widely used by academics, archivists, and librarians worldwide for archival and research purposes. While some hands-on Drupal experience is recommended, we introduce fundamental Drupal concepts and walk you through the basic steps of installation. Drupal experience, therefore, is not essential. After you complete the tutorial, you will have learned how to install Drupal and how to configure the Content Construction Kit (CCK) and XML Content modules to enable various content types that can be input in XML, validated against your custom schema, and published according to the specifications defined in your stylesheets.
The sample site covered in this tutorial demonstrates how to publish documents that strictly adhere to custom XML standards using the Drupal content management system.
Although Drupal is not the only option (not even the only free and open source option) to implement a system that enables publication of TEI documents, it is one of the most widely used platforms, running hundreds of thousands of sites worldwide, making it both mature (well tested) and well supported by the community.
Because TEI P5 XML is one of the most widely used published standards for academic, archival, and research purposes, it is the format we chose for this tutorial. Other XML standards with available schemas, such as DocBook or DITA XML, can be used where we implement TEI, assuming that you make the necessary changes.
Among the driving factors for many who choose TEI XML (including the authors) for archival and research purposes are the range of data types supported by the TEI's Guidelines for Electronic Text Encoding and Interchange (that is, TEI's markup standard) and the active, ongoing development of the standard by the TEI community. We therefore consider TEI markup to be one of the best choices for describing, displaying, and retaining documents, offering powerful and flexible display capabilities when it is leveraged together with any number of the available free and open source XML tools.
Drupal CMS—Drupal is freely available and can be downloaded from http://drupal.org/download. This tutorial uses Drupal version 6.
You need a web server or web host with PHP installed and access to a database in order to install Drupal and make your site available to the public across the web. We used Apache and MySQL. Although it is beyond the scope of this tutorial to take you through the selection of a web hosting provider or installation of a local web server and database, you can find that many inexpensive web hosts support the installation of Drupal and provide access to databases such as MySQL or PostgreSQL.
In addition to Drupal itself, you also need to download a few Drupal modules to enable the publishing features described in the rest of the tutorial:
- The XML Content module to enable uploading, enforcement, and guidance with regard to the site publisher's chosen XML features.
- The Content Construction Kit (CCK) module for Drupal to enable custom types of Drupal content, in this case the addition of an XML content type defined by the site publisher.
- You might also wish to choose a Drupal theme that enables you to change the appearance of your site.
TEI Roma—TEI Roma is a web-based tool for generating custom XML schemas that the publication module described in the tutorial uses to enforce the standards chosen by the site publisher.
See Resources for links to all the tool downloads.

