Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Developing Drupal publications to support standards-based XML

Customize your Drupal installation to support the publication of TEI (or other) XML documents

Garrick Bodine (garrick.bodine@gmail.com), Information Technology Manager, Penn State University
Garrick Bodine is an Information Technology Manager in the Office of Undergraduate Admissions at Penn State University.
Stephanie Schlitz (sschlitz@gmail.com), Linguistics Professor, Bloomsburg University of Pennsylvania
Stephanie Schlitz is a Linguistics Professor at Bloomsburg University of Pennsylvania. She collaborates on several ongoing XML-based projects.

Summary:  Academic and corporate clients seeking digital journals or other types of web publications regularly require platforms that support standards-based XML. This tutorial explains how to customize a Drupal implementation to develop publications that enable editors, authors, and users to submit and edit content in standards-based XML, where the standard can be enforced using server-side validation settings. For illustrative purposes, the discussion references TEI XML, the markup standard in widespread use in academia.

Date:  08 Feb 2011
Level:  Intermediate PDF:  A4 and Letter (803 KB | 33 pages)Get Adobe® Reader®

Activity:  26420 views
Comments:  

Before you start

Frequently used acronyms

  • CMS: Content management system
  • CSS: Cascading Stylesheets
  • FTP: File Transfer Protocol
  • HTML: HyperText Markup Language
  • SQL: Structured Query Language
  • URL: Uniform Resource Locator
  • XML: Extensible Markup Language
  • XSL: Extensible Stylesheet Language
  • XSLT: Extensible Stylesheet Language Transformation

This tutorial is for developers interested in collecting and publishing documents based on a standardized XML format. In this case, we use the Text Encoding Initiative's TEI P5, a format widely used by academics, archivists, and librarians worldwide for archival and research purposes. While some hands-on Drupal experience is recommended, we introduce fundamental Drupal concepts and walk you through the basic steps of installation. Drupal experience, therefore, is not essential. After you complete the tutorial, you will have learned how to install Drupal and how to configure the Content Construction Kit (CCK) and XML Content modules to enable various content types that can be input in XML, validated against your custom schema, and published according to the specifications defined in your stylesheets.

About this tutorial

The sample site covered in this tutorial demonstrates how to publish documents that strictly adhere to custom XML standards using the Drupal content management system.

Although Drupal is not the only option (not even the only free and open source option) to implement a system that enables publication of TEI documents, it is one of the most widely used platforms, running hundreds of thousands of sites worldwide, making it both mature (well tested) and well supported by the community.

Because TEI P5 XML is one of the most widely used published standards for academic, archival, and research purposes, it is the format we chose for this tutorial. Other XML standards with available schemas, such as DocBook or DITA XML, can be used where we implement TEI, assuming that you make the necessary changes.

Among the driving factors for many who choose TEI XML (including the authors) for archival and research purposes are the range of data types supported by the TEI's Guidelines for Electronic Text Encoding and Interchange (that is, TEI's markup standard) and the active, ongoing development of the standard by the TEI community. We therefore consider TEI markup to be one of the best choices for describing, displaying, and retaining documents, offering powerful and flexible display capabilities when it is leveraged together with any number of the available free and open source XML tools.


Prerequisites

Drupal CMS—Drupal is freely available and can be downloaded from http://drupal.org/download. This tutorial uses Drupal version 6.

You need a web server or web host with PHP installed and access to a database in order to install Drupal and make your site available to the public across the web. We used Apache and MySQL. Although it is beyond the scope of this tutorial to take you through the selection of a web hosting provider or installation of a local web server and database, you can find that many inexpensive web hosts support the installation of Drupal and provide access to databases such as MySQL or PostgreSQL.

In addition to Drupal itself, you also need to download a few Drupal modules to enable the publishing features described in the rest of the tutorial:

  • The XML Content module to enable uploading, enforcement, and guidance with regard to the site publisher's chosen XML features.
  • The Content Construction Kit (CCK) module for Drupal to enable custom types of Drupal content, in this case the addition of an XML content type defined by the site publisher.
  • You might also wish to choose a Drupal theme that enables you to change the appearance of your site.

TEI Roma—TEI Roma is a web-based tool for generating custom XML schemas that the publication module described in the tutorial uses to enforce the standards chosen by the site publisher.

See Resources for links to all the tool downloads.

1 of 8 | Next

Comments



static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Open source, Web development,
ArticleID=623175
TutorialTitle=Developing Drupal publications to support standards-based XML
publish-date=02082011
author1-email=garrick.bodine@gmail.com
author1-email-cc=nancy_hannigan@us.ibm.com
author2-email=sschlitz@gmail.com
author2-email-cc=nancy_hannigan@us.ibm.com