Developing Drupal publications to support standards-based XML

Customize your Drupal installation to support the publication of TEI (or other) XML documents

Academic and corporate clients seeking digital journals or other types of web publications regularly require platforms that support standards-based XML. This tutorial explains how to customize a Drupal implementation to develop publications that enable editors, authors, and users to submit and edit content in standards-based XML, where the standard can be enforced using server-side validation settings. For illustrative purposes, the discussion references TEI XML, the markup standard in widespread use in academia.

Share:

Garrick Bodine (garrick.bodine@gmail.com), Information Technology Manager, Penn State University

Garrick Bodine is an Information Technology Manager in the Office of Undergraduate Admissions at Penn State University.



Stephanie Schlitz (sschlitz@gmail.com), Linguistics Professor, Bloomsburg University of Pennsylvania

Stephanie Schlitz is a Linguistics Professor at Bloomsburg University of Pennsylvania. She collaborates on several ongoing XML-based projects.



08 February 2011

Also available in Portuguese

Before you start

Frequently used acronyms

  • CMS: Content management system
  • CSS: Cascading Stylesheets
  • FTP: File Transfer Protocol
  • HTML: HyperText Markup Language
  • SQL: Structured Query Language
  • URL: Uniform Resource Locator
  • XML: Extensible Markup Language
  • XSL: Extensible Stylesheet Language
  • XSLT: Extensible Stylesheet Language Transformation

This tutorial is for developers interested in collecting and publishing documents based on a standardized XML format. In this case, we use the Text Encoding Initiative's TEI P5, a format widely used by academics, archivists, and librarians worldwide for archival and research purposes. While some hands-on Drupal experience is recommended, we introduce fundamental Drupal concepts and walk you through the basic steps of installation. Drupal experience, therefore, is not essential. After you complete the tutorial, you will have learned how to install Drupal and how to configure the Content Construction Kit (CCK) and XML Content modules to enable various content types that can be input in XML, validated against your custom schema, and published according to the specifications defined in your stylesheets.

About this tutorial

The sample site covered in this tutorial demonstrates how to publish documents that strictly adhere to custom XML standards using the Drupal content management system.

Although Drupal is not the only option (not even the only free and open source option) to implement a system that enables publication of TEI documents, it is one of the most widely used platforms, running hundreds of thousands of sites worldwide, making it both mature (well tested) and well supported by the community.

Because TEI P5 XML is one of the most widely used published standards for academic, archival, and research purposes, it is the format we chose for this tutorial. Other XML standards with available schemas, such as DocBook or DITA XML, can be used where we implement TEI, assuming that you make the necessary changes.

Among the driving factors for many who choose TEI XML (including the authors) for archival and research purposes are the range of data types supported by the TEI's Guidelines for Electronic Text Encoding and Interchange (that is, TEI's markup standard) and the active, ongoing development of the standard by the TEI community. We therefore consider TEI markup to be one of the best choices for describing, displaying, and retaining documents, offering powerful and flexible display capabilities when it is leveraged together with any number of the available free and open source XML tools.

Prerequisites

Drupal CMS—Drupal is freely available and can be downloaded from http://drupal.org/download. This tutorial uses Drupal version 6.

You need a web server or web host with PHP installed and access to a database in order to install Drupal and make your site available to the public across the web. We used Apache and MySQL. Although it is beyond the scope of this tutorial to take you through the selection of a web hosting provider or installation of a local web server and database, you can find that many inexpensive web hosts support the installation of Drupal and provide access to databases such as MySQL or PostgreSQL.

In addition to Drupal itself, you also need to download a few Drupal modules to enable the publishing features described in the rest of the tutorial:

  • The XML Content module to enable uploading, enforcement, and guidance with regard to the site publisher's chosen XML features.
  • The Content Construction Kit (CCK) module for Drupal to enable custom types of Drupal content, in this case the addition of an XML content type defined by the site publisher.
  • You might also wish to choose a Drupal theme that enables you to change the appearance of your site.

TEI Roma—TEI Roma is a web-based tool for generating custom XML schemas that the publication module described in the tutorial uses to enforce the standards chosen by the site publisher.

See Resources for links to all the tool downloads.


Getting started with Drupal

The Drupal CMS is one of the most popular and well-supported content management systems in the world and is constantly being improved and enhanced by its large collection of community-contributed modules. Its security and stability have been proven over years on the Internet, which makes it a strong choice for those who collect and publish archival content on the web. In this section you begin setting up the Drupal environment.

Installing Drupal and its modules

After you download the Drupal package from Drupal's site, you need to uncompress the archive and upload it to your web server. Although methods for performing this step vary, we used the free FTP client program FileZilla to FTP the Drupal files to our web host, as illustrated in Figures 1 and 2.

Figure 1. Upload Drupal to your web server
Screen capture of local and remote site directories during FTP of Drupal to local web server

The filename of the Drupal installation usually includes the version number (for example, in this case, it's 6.19). For the purposes of this tutorial, we are going to install Drupal to the root of our website, digarch.org. For that reason, we upload only the contents of the downloaded Drupal folder to the web root directory of our website.

If you wish to install the publication site outside of your root website directory, you need to upload the entire drupal-6.19 folder to the chosen location on your site. Then, simply rename the folder containing the Drupal installation to reflect the address you want to appear on the web, changing drupal-6.19 to journal, for instance, if you want the publication to appear as yoursite.com/journal. From this point on in the tutorial, the directory that contains all the Drupal files is referred to as [DRUPAL_HOME].

Read the INSTALL.TXT file that is included in the root of the Drupal installation to ensure that you meet all the requirements for your specific web hosting environment. We briefly cover the basic installation and configuration here.

Creating the settings for your site

After you upload all the Drupal files to the chosen location on your website, you need to create the required settings file for your site. The easiest way to do this is to locate the default.settings.php file (which should be located in the [DRUPAL_HOME]/sites/default directory; see Figure 2) and rename it to settings.php. You also need to grant Drupal access to your database. Consult your database's documentation or your web host's instructions to complete this part, as the process varies considerably from installation to installation.

Figure 2. Rename the default.settings.php file to settings.php
Screen capture: Rename the default.settings.php file to settings.php

With these steps complete, Drupal's online installer can make the rest easy for you; just browse to your site and follow the onscreen instructions.

Installing the CCK and XML content modules

If you have prior experience with Drupal, you will be happy to see that installing the modules recommended in this tutorial is achieved in the same manner as most other Drupal modules:

  1. Download the necessary files from the modules' project pages (see Resources).
    • Content Construction Kit (CCK)
    • XML Content
  2. Move the entire directory containing the module's files (usually the directory name is the same as the module name) to your Drupal site in the [DRUPAL_HOME]/sites/all/modules directory, creating this directory if necessary

After you upload the modules, browse to the Modules section of Drupal's administration panel (Administer > Site Building > Modules). To enable each module, checking the box next to the components for each module that you installed, as in Figure 3. (Selected modules include Content, Content Copy, Content Permissions, Fieldgroup, Node Reference, Number, Option Widgets, Text, and User Reference for the CCK and XML Content for the Content Filters.)

Figure 3. Enable Drupal modules
Screen capture of enable Drupal modules for CCK and Content filters

Don't forget to click the Save configuration button at the bottom of that page, or the modules are not enabled.

After you save your configuration changes, you receive a confirmation dialog noting that they were saved, and in this case an additional message (see Figure 4) asking you to configure some field permissions. These permissions pertain to the CCK module; it needs to know what kinds of users on your site have permission to edit the existing and future content on your site.

Figure 4. Dialog window prompting field permissions configuration and confirming configuration save
Screen capture: Dialog window prompting field permissions configuration and confirming configuration save

The permissions configurations in Drupal are quite powerful and are, for the most part, beyond the scope of this tutorial. As you are still logged in as the administrator for your site, no further permissions modifications are necessary for you at this time. Suffice it to say that if you wanted to, you could allow authenticated members of your site (or even anonymous users) to submit, edit, and otherwise maintain content on the site, down to the individual fields of a Drupal node. This latter part will probably make more sense when you create an example of a new content type as you move along in this tutorial.

Using the CCK module to create a custom content type

Because you've installed and enabled the CCK module in Drupal, you're ready to create new, custom content types that differ from the built-in content types for Drupal nodes such as page and story. This process of creating a new type allows you to treat the content, permissions, and forms for creating your custom content site differently from the other useful Drupal built-in types that you continue to use for the other, general-use sections of your site that (for good reason) are not necessarily composed in standards-based XML.

Here, we go over the creation of a custom content type that is based on TEI XML, and we call the new content type the TEI XML Document type because that's the basic and probably the most common implementation of TEI XML. We should add that digital publications often have several custom content types that allow for different forms of content, including audio interviews, video presentations, and other multimedia forms, with accompanying textual transcripts. It's possible as well, using precisely the same steps that we describe later for the TEI XML Document type, to create these content types and to enable XML input for their textual components.

The first step in creating a new content type is to navigate to the content management menu of the Administer section and choose Content Types (Administer > Content management > Content types). On that page, you see a link to add a new content type. When you click Add content type, you see something that resembles Figure 5.

Figure 5. Create a new content type
Screen capture of the 'Create content type' window

To identify the new content type, enter a Name, Type, and Description. We filled in a human-readable version of the new content type's name, TEI XML Document—a name that's useful for us and others who might be contributing to our site—and a general description of the content type. The machine-readable version of the content type name, Document, is, as it says on the page, used for constructing URLs, site links, and so on, that do not allow the spaces and punctuation that we humans are so fond of.

Click the Save content type button to create your new content type in Drupal. You will see the human-readable name you assigned to the new type in the confirmation dialog as in Figure 6.

Figure 6. Dialog window confirming creation of TEI XML document content type
Screen capture: Dialog window confirming creation of TEI XML document content type

You are almost ready to add site content, but first, you have to finish configuring the other module that you installed and enabled previously (the XML Content module). Then you can configure your TEI XML Document type to use and enforce standards-based XML as its underlying content.

Creating a new input format to use the XML Content module

With the XML Content module enabled, it's time to configure Drupal to use it. You have to add a new input format to let Drupal know that, in addition to its default input formats (usually Filtered HTML and Full HTML for most content node types), you want to include a TEI XML input type that uses your newly installed XML Content module to filter and administer it.

Navigate to the Add Input Filter screen in the administration panel (Administer > Site Configuration > Input Formats) and click Add input format (see Figure 7). In this case, name the input format TEI XML, as it is your intention to limit created or contributed content of your newly created TEI XML Document type to content that is both well-formed XML and valid according to the TEI standards-based XML schema that you create later in the tutorial.

Figure 7. Add XML as an input format
Screen capture of 'Add input format' window with TEI XML specified in the Name field

You've configured Drupal with the necessary modules, created the new TEI XML content type, and added XML as an input format. Next, create a schema to validate your XML documents and configure the schema within the XML Content module.


Creating an XML schema

Now that you have set up Drupal and the necessary TEI XML modules, it's time to support XML content by developing schemas that validate your XML.

Why schemas matter

The general definitions of whichever version of XML you are using describe well-formedness, that is, the actual syntax and layout that might be used to create a viable, machine-readable XML document. XML itself is meant to be nearly infinitely flexible. You can make up syntactically valid, well-formed XML using elements and attributes that no one has seen before. While this ability is indeed essential in making XML universally useful, very often specific applications of XML require that only certain elements, attributes, and values are used. Although there are a number of ways to accomplish this task, schemas are regularly employed in this context to ensure and enforce continuity among documents within a community or collection.

XML schemas are machine-readable technical descriptions of what constitutes valid XML documents according to the rules described within the schema. The rules might be strict or lax, and they are compiled arbitrarily by document authors or designers.

In the site we are creating, we plan to allow only a certain subset of all the elements available within TEI XML markup. This constraint allows us to prepare the necessary XSL and CSS display aspects more accurately, as we can ensure that there are no unexpected tags or attributes in the materials. TEI P5 XML is already a strictly defined XML application, but we can further streamline the available options for our documents by using a TEI markup validation tool provided by the TEI Consortium: TEI Roma.

Using TEI Roma

TEI Roma initially gives several options as starting points for creating a TEI schema. You can create a customized validator based on some of the most commonly used applications of TEI, as in Figure 8.

Figure 8. Create an XML schema
Screen capture of creating an XML schema based on a customized template in TEI Roma

We use a plain, unadulterated version of TEI Lite, a subset of TEI P5 that contains most of the commonly used elements needed for describing documents in a digital format, though you can use any version of TEI produced by Roma with the XML Content module as described in the text that follows.

One important consideration to note is that Roma gives multiple format options in the delivery of your custom schema. Although the XML Content module handles several formats as well, you need to be sure to select a compatible one when you tell Roma to output your file. We have found that the RELAX NG (see Figure 9) format (XML syntax) works well with the XML Content module's validator and is a powerful and portable format if you need to use your schema for other purposes, so we use it for the rest of this tutorial (see Resources).

Figure 9. Select a schema format
Screen capture of selecting a schema format: RELAX NG schema (XML syntax)

After you download the schema file from Roma, upload it to the appropriate location on your website using the same methods that you used to upload Drupal and its modules. We place the file in the XML Content module's expected location in the sites/all/modules/xmlcontent directory.

You now need to update the XML Content module's validator to indicate that you've included a custom schema for it to validate all incoming XML content against. Navigate to the Input Formats section of Site Configuration in your Drupal administration console (Administer > Site configuration > Input formats > TEI XML). As in Figure 10, in the Schema File Pathfield, type in the filename and extension of the file you just uploaded (as provided by Roma in this case): teilite.rng.

Figure 10. Configure the schema in the XML Content module
Screen capture of configuring the schema in the XML Content module

After you upload the teilite.rng file containing the custom schema you created and have enabled the schema validation in the input filter configuration settings, you are ready to begin uploading or creating TEI-compliant XML content on your site.


Creating TEI XML content in Drupal

Now that you've developed a schema and configured it in your Drupal installation, you add an XML document to your site and validate it against the schema.

Creating TEI XML content

Creating TEI XML content in Drupal can now be undertaken in the usual Drupal manner:

  1. Navigate to Create Content in your navigation links.
  2. Choose TEI XML Document (the custom content type that you created in Figure 5—the page and story content types are Drupal's built-in types).
  3. Give your document a title.
  4. In the Body section of the Document form, create your TEI XML content or paste it in from a different editor.
  5. Below the Body section where you added your XML content, open the Input format display and select the TEI XML format, as indicated in Figure 11.
Figure 11. Select XML as the document input format
Screen capture of TEI XML selected as the document input format

The main difference in the content creation process now is that when you preview or submit your content of the TEI XML Document type, the XML Content module uses the XML schema you assigned to the input filter earlier to do the following:

  • Validate the XML content in the Body field against your schema
  • Ensure that it is well-formed XML

If well-formedness or validation errors are present, you receive messages that can help you debug the input so that it validates. The validator first checks for well-formedness and reports any fatal XML errors as illustrated in Figure 12. The example lists three fatal errors: mismatched opening and end tags for p and div elements, data that ends prematurely in a div tag, and data that ends prematurely in a pb tag.

Figure 12. Example of well-formedness validation errors
Screen capture of example of well-formedness validation errors

If the document contents are well-formed XML, it runs the validation checks against your uploaded custom schema and reports any errors as in Figure 13. The example lists three validation errors: Expecting element publicationStmt, got sourceDesc; Element fileDesc failed to validate content; and Element teiHeader failed to validate content.

Figure 13. Example of schema validation errors
Screen capture of example of schema validation errors

If the content is well-formed and validates against the schema, it is automatically saved in Drupal and is publishable (or published immediately if the publication settings were checked accordingly in the Add Content screen).

Although we are using TEI XML as an input format primarily for a Drupal content type labeled TEI XML Document, there are no restrictions to define how XML format is to be used in a site. After you add and activate the XML Content module, XML is an input format option for every other content type your site is customized to publish. And using the XML filter (Home > Administer > Site configuration > Input formats > TEI XML filter), you can adjust the XML Content module settings to permit, require, or preclude content in XML by content type and by user permission, as in Figure 14. [Roles options for TEI XML include authenticated user (selected) and anonymous user (not selected). Filters options include XML Content XSLT filter (selected) and HTML corrector, HTML filter, Line break converter, and URL filter (all unselected)]

Figure 14. Enable XML input by user type
Screen capture of enabling XML input by user type; shows Roles and Filters options

Although the XML Content module specifies well-formedness and schema validation constraints for XML documents, none of the functionality you normally expect for the Drupal content type you are working with and have permissions for (for example, page content types allow for revision information, comment settings, authoring information, and publishing options) is affected by the addition of the module. For more information about these roles and their effects on content, look at the developerWorks series on Exploring Drupal V6, particularly Part 2 in this case (see Resources).

You just customized Drupal so that an XML document can be added and validated on your site. Now you're ready to consider how to display it using XSLT and CSS.


Rendering XML content

Although you now have content in the system that you know adheres to the subset of elements that you have preselected in your validation schema, you have not yet told Drupal how to display the information. As it is, the content shows on the screen, but elements unknown to any visitors' web browsers are simply ignored. Some of the richer and more interesting metadata in the document (that's why we chose the TEI XML format to begin with, right?) is not represented on-screen in any discernible way to readers, though it is available if one views the source of the page.

CSS and XSLT

To convey as much of the content's metadata to the user as possible, predefined styles are often employed to represent some of the metadata visually to users who view the content in a web browser.

Two kinds of styling are available to provide additional information to clients viewing the documents on your site:

  • XSLT—An XML technology that can take XML content as input and transform it into different XML for output based on custom rules provided in the stylesheet. XSLT can be client-side (that is, the data is transformed on the user's computer in his or her web browser) or server-side (that is, the data is transformed on the web server before it is passed to the client).
  • CSS—a client-side technology, CSS provides information to the browser of a web user. The CSS information tells how to represent elements in a page's source code that meet certain criteria when the browser displays a web page to a user.

CSS and XSLT are not mutually exclusive and, in fact, are regularly used together. We employ both in our site to represent TEI XML documents more richly than those that are provided by the default installation and configurations.

The XSLT functionality we employ is provided by the XML Content module, which is configurable in a manner analogous to the configuration of its schema validation: Create a custom XSLT stylesheet, upload it to the site, and enable it in the input filter configuration area (Administer > Site configuration > Input formats > TEI XML). Provide the stylesheet filename, myTei.xsl, and path in the XSLT Script File Path field (see Figure 15).

Figure 15. Configure a stylesheet in the XML Content module
Screen capture of configuring a stylesheet in the XML Content module

The TEI stylesheets

You can use any valid XSL stylesheets to render your XML content with the XML Content module. The structure, elements, and attributes defined and allowed by the XML schema you created and implemented with the XML Content module ensure that the document structure is known and predictable. This quality is critical in creating a stylesheet that renders the display version of the document on your site appropriately.

If you aren't familiar with XSL stylesheets, or perhaps even if you are, you might wish to start with a base model for transforming the TEI document using XSLT. The TEI Consortium maintains a rather large set of XSL stylesheets designed specifically for transforming TEI documents into other formats, from different flavors of (X)HTML to Open Office and Microsoft® Word formats. Of course, for the purposes of this tutorial, we focus on transforming the TEI input document to XHTML output that is integrated into a Drupal site.

The TEI stylesheets are, in fact, a collection of numerous, linked individual XSL stylesheets containing hundreds of functions for transforming and rendering most of the expected structure and elements found in standard TEI document implementations.

In addition to providing downloadable versions of the TEI stylesheets, the Consortium also provides and maintains a web interface called Stylebear that creates a customized implementation of the Consortium's TEI stylesheets. It creates a "master" stylesheet for the collection that contains values for variables used throughout the stylesheets.

Creating and customizing the TEI stylesheets

Upon visiting the Stylebear site, you can find a web form with many, many fields. Links above each section point to the section's documentation. At a glance, you likely are able to get a sense of the immense flexibility and comprehensiveness of both TEI and the TEI stylesheets. For the purposes of this tutorial, we do not cover Stylebear in great detail because it is designed to create stand-alone output versions of TEI documents (and we, by contrast, implement a system to integrate and publish many TEI documents in an existing website with a built-in content management system and all its attendant features).

Figure 16 provides a screen capture of the information we entered to create a base set of stylesheets for us to work with. That information includes fields to enter department, homeLabel, homeURL, homeWords, institution, parentURL, parentWords, searchURL, alignNavigationPanel, bottomNavigationPanel option, feedback URL and other details.

Figure 16. Create a stylesheet
Screen capture of Standard page features for the myTei.xsl stylesheet created in Stylebear

Look through the other sections of Stylebear to see the other options afforded by the tool, including options to insert your own JavaScript, CSS, or HTML elements into the output to be created by the Stylebear-modified TEI stylesheets (in Section 14, Hooks). You can also adjust textual content, such as a copyright message (in Section 9, Internationalization), for example.

The output file from Stylebear is named myTei.xsl by default. Keep this name and upload the file to your web server. Once again, the default directory location for our custom files used by the XML Content module (that is, the XML schema and the XSL stylesheet, but not the XML content) is [DRUPAL_HOME]/sites/all/modules/xmlcontent, and this is the location that you use. The myTei.xsl file simply contains default values for variables it supplies to the stylesheets that it imports from the TEI website. For performance, security, and stability reasons, download a copy of the complete sets of stylesheets from the TEI website and install them in your own Drupal instance.

Installing the TEI stylesheets

You can download the TEI stylesheets from the TEI's SourceForge repository. After you download the stylesheets, uncompress them and upload them to your Drupal installation. We recommend that you keep them in the same location as the other XML-related files (such as your schema and custom stylesheet, [DRUPAL_HOME]/sites/all/modules/xmlcontent), and that's the practice we follow here. We named the directory on our server that contains all of the .xsl files in their respective nested directories Stylesheets.

Because the TEI stylesheets files we use reside on our server, we need to modify myTei.xsl with a relative link to the copy of them on our website instead of pointing to the Consortium's site. Although this can be done within the Stylebear tool, we usually reserve the change until after we've located the place on our server that we'd like to keep them. We also make a couple of other changes to the XML lines noted in the content that follows. The first few lines of the myTei.xsl document produced by Stylebear are provided in Listing 1.

Listing 1. Modify the stylebear stylesheet
<xsl:stylesheet 
    xmlns:tei="http://www.tei-c.org/ns/1.0"
    xmlns="http://www.w3.org/1999/xhtml"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    exclude-result-prefixes="#default tei xsl"
    version="1.0">
<!-- XSLT stylesheet to generate HTML version of TEI document.
Written by the TEI XSL generator (Sebastian Rahtz, sebastian.rahtz@oucs.ox.ac.uk)
Created on 28 Nov 2010-->
<xsl:import href="http://www.tei-c.org/release/xml/tei
/stylesheet/xhtml2/tei.xsl"/>

As shown in Listing 1, we change the import element to point to our local versions of the stylesheets we downloaded by modifying the href attribute using a relative path to point to the master TEI stylesheet—that is, the stylesheet that imports all the other needed stylesheets from the bundle downloaded from the TEI Consortium: <xsl:import href="Stylesheets/xhtml/tei.xsl"/>.

Modifying the XSLT version attribute

At this point, you might have noticed that in addition to changing the filepath in the href attribute from remote to local, we also changed the xhtml2 to xhtml. If you run the default output of a stylesheet created with Stylebear, you are likely to see many errors in the resulting output page. By default, Stylebear takes advantage of many new features in XSLT 2.0 (thus the xhmtl2 moniker for the default set of stylesheets and the version number of 2.0); however, the PHP XML libraries employed by Drupal and the XML Content module do not support them, at least not at the time this tutorial was written. A couple of quick and minor modifications in myTei.xsl, though, can rectify the situation:

  1. Modify the version attribute in the first element (xml:stylesheet) to read 1.0 instead of 2.0.
  2. As noted, the path in the address whence the stylesheet imports the rest of the XSL rules content should also be changed from /xhtml2/tei.xsl to /xhtml/tei.xsl to tell the stylesheet to import the XSLT 1.0 version of the TEI stylesheets (also provided by the TEI Consortium in the downloaded bundle) instead of the default XSLT 2.0 version.

After you install, modify appropriately, and enable the stylesheets, all of the content that is entered by way of the TEI XML input filter that you defined are transformed from the native TEI XML input into an XHTML version that web browsers can display and interpret more readily.

Getting valid XHTML

Although doing all of the above should rid the resulting output page of any displayed processing errors, and might even render the page without visible problems, it likely does not result in valid XHTML on the page without further modification.

As mentioned, the TEI stylesheets that we have employed are made to be stand-alone, yet we use them to render only a portion of a complete HTML page. In other words, the TEI stylesheets are designed to output a complete (X)HTML document given TEI XML input, yet in our case, most of the page rendered on our website is being created by our content management system, Drupal, and only the custom content portion (node body in Drupal terminology) within the page is transformed and rendered using the XSLT stylesheets. This approach results in a full HTML document appearing inline within the HTML document created by Drupal. Because an HTML element is not allowed to be nested within another HTML element directly in XHTML, our site page fails validation until we modify our stylesheets to create content only from our content, not the surrounding boilerplate material required of a stand-alone (X)HTML document.

To avoid this issue, you must make some adjustments to the stylesheets, including some overriding functions in the main XML document. Most of the HTML elements placed around the content by the TEI stylesheets are created when the stylesheet first encounters the main TEI element. Therefore, we override that functionality by providing a new XSLT template that calls only the child templates that the original called (so the cascading functionality provided by that element is still present), but the surrounding tags are suppressed.

You can find the original template functionality in textstructure.xsl in the XHTML directory of the TEI stylesheets, but Listing 2 shows what we use to override it by including it after the xsl:import in the myTei.xsl stylesheet.

Listing 2. Override stylesheet functionality
<xsl:template match="tei:TEI">
    <xsl:call-template name="teiStartHook"/>
    <xsl:call-template name="javascriptHook"/>
    <xsl:call-template name="bodyHook"/>
    <xsl:call-template name="bodyJavascriptHook"/>
    <xsl:call-template name="startHook"/>
    <xsl:call-template name="simpleBody"/>
    <xsl:call-template name="stdfooter"/>
    <xsl:call-template name="bodyEndHook"/>
    <xsl:call-template name="teiEndHook"/>
</xsl:template>

You can probably see that now when the stylesheet encounters the root TEI element of the TEI document, it simply calls the other XSL templates and hooks linked in the stylesheet bundle, but does not create an entirely new HTML document.

You might wish to remove some of the other functions, depending on the documents and TEI features that you work with in your source documents. Although an in-depth discussion of all of the features within the TEI stylesheets is well outside the scope of this tutorial, the TEI stylesheets are well documented internally and on the TEI website.

Using custom XSL stylesheets

Using the TEI Consortium-provided TEI stylesheets is by no means a requirement and, in many circumstances, might not be the best or most expedient method. You might wish to design and implement your own XSLT stylesheets, particularly if the TEI documents you publish contain a small or very predictable subset of TEI elements. To implement custom stylesheets, you can simply create an XSL stylesheet, upload it to your server, and adjust the path of the XML Content module to the stylesheet in the configuration section.

Advantages of using the TEI stylesheets

As you've seen, it is easy to override the XSL functions contained in the stylesheets bundle without losing the base functionality provided from years of updates and improvements by the stylesheets' maintainers. In addition to basic conversion of TEI-specific elements to loosely corresponding or representative HTML elements, the stylesheets often provide good visual renderings and even assign class attributes to the HTML elements that provide yet another method for you easily to customize the visible aspects of the underlying TEI markup with a technology that is usually considered easier to master than XSL, and in fact, one that you might already know: CSS.

Using CSS to style and render the published TEI content

Like most modern website design frameworks, Drupal themes make extensive use of CSS to provide the visible variation in the site, from typography and font matters to page layout and image placement.

CSS accomplishes this by using selectors. Selectors are aspects of elements on the page that the stylesheet uses to identify page items to render in specific ways. They can be element names themselves (such as p or div), or they can be attributes (such as class or id). Although CSS is also outside the scope of this tutorial, it would be well worth the time invested to check out a tutorial devoted to it if you are not already familiar with the technology (see Resources for a link).

The important point here is that the TEI stylesheets provide selectors for many of the common TEI elements by assigning a class attribute to a span element that surrounds the content of a corresponding TEI element. This probably sounds more confusing than it is. Let's look at an example.

The following is a relatively common element and construction in a TEI document: <foreign>sic</foreign>.

The output from the TEI stylesheets results in the following XHTML: <i><span class="foreign">sic</span></i>.

You can see that not only does it provide a basic rendering for the TEI XML <foreign> tag (it italicizes it using the <i> xhtml tag), it also wraps it in a <span> with an attribute that contains the name of the TEI element. Although by default, the <span class="foreign"> won't cause any difference in the browser display, the construction has provided you with a selector that you can use to control the style for all <foreign> elements within your published TEI document through your CSS stylesheet.

Your CSS stylesheet can have any number of items added to style the desired TEI items so that you can display any aspect of the metadata contained in the TEI document in a way that you feel is appropriate.

Now that you've considered how to use XSLT and CSS to render your XML documents, you're ready to publish.

Publishing XML documents

At this point, because you have installed and configured the necessary modules in Drupal, created a new TEI XML Document content type, configured TEI XML as an input type, and configured a schema and stylesheet within the XML Content module, you are ready to publish.

Now, when a richly encoded XML document is added in the document body field (as shown in Figure 17) and the document is published (as shown in Figure 18), the data and metadata marked up in the XML document is effectively rendered.

Figure 17. Add an XML document in the XML document body field
Screen capture of an XML document in the XML document body field

Figure 18 shows the published document.

Figure 18. Image of rendered, human readable text published in Drupal
Screen capture of a rendered, human readable text published in Drupal

Because we created and configured the schema and stylesheet to be used with the XML documents published on our site, we defined the XML elements and attributes accepted by our site. When an XML document is submitted, it is automatically validated for well-formedness and for schema compliance. And when a document is submitted, validated, and accepted for publication, our stylesheet prescribes its display format.

Our Drupal customization also affords us crucial levels of flexibility and granularity in determining which Drupal node types are to be written in TEI XML and which of users of our site can (and can't) input in XML.


Summary

This tutorial illustrated how to customize a Drupal installation to support the publication of TEI (or other) XML documents. You should now be prepared to install Drupal and to configure the XML Content and CCK modules and to create content types that support and validate XML input.

Resources

Learn

  • Exploring Drupal V6 Martin Streicher, developerWorks, August-September 2009: Get an introduction to Drupal in this three-part article series.
  • XSL stylesheets for TEI XML: Explore this set of XSLT 2.0 specifications to transform TEI XML documents to XHTML, to LaTeX, to XSL Formatting Objects, to and from OOXML (docx), to and from OpenOffice (odt), and to ePub format.
  • Drupal.org: Find documentation, add-on modules, and other resources, and to connect with others in the Drupal community.
  • Text Encoding Initiative (TEI): Learn about the TEI consortium that collectively develops and maintains a standard for the representation of texts in digital form.
  • Guidelines for Electronic Text Encoding and Interchange: Stay current with the latest information on TEI's markup standard.
  • Display XML with Cascading Stylesheets, Part 1: Using stylesheets to display XML (Uche Ogbuji, developerWorks, November 2004): In this tutorial, explore basic techniques on how to use CSS to present XML in web browsers.
  • Understanding RELAX NG (Nicholas Chase, developerWorks, December 2003): Explore the concepts behind RELAX NG in both its XML and compact forms in this tutorial. RELAX NG uses XML syntax, and enables developers to create most of the same rules as the W3C XML Schema language, but with a greatly simplified syntax.
  • XML Matters: Kicking back with RELAX NG (David Mertz, developerWorks, February-May 2003): Review this three-part series on RELAX NG schemas, including its compact and XML syntax.
  • XML Matters: TEI - the Text Encoding Initiative (David Mertz, developerWorks, September 2003): Look at Text Encoding Initiative, an XML schema devoted to the markup of literary and linguistic texts. TEI allows useful abstractions of typographic features of source documents, but in a manner that enables effective searching, indexing, comparison, and print publication—something not possible with publications archived as mere photographic images.
  • XML area on developerWorks: Get the resources you need to advance your skills in the XML arena.
  • My developerWorks: Personalize your developerWorks experience.
  • IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
  • XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks. Also, read more XML tips.
  • developerWorks technical events and webcasts: Stay current with technology in these sessions.
  • developerWorks on Twitter: Join today to follow developerWorks tweets.
  • developerWorks podcasts: Listen to interesting interviews and discussions for software developers.
  • developerWorks on-demand demos: Watch demos ranging from product installation and setup for beginners to advanced functionality for experienced developers.

Get products and technologies

  • Drupal: Download Drupal core files, and extend your site with modules, themes, translations and installation profiles. This tutorial uses Drupal version 6.
  • Drupal XML Content module: Save XML inside the body of any node type and have it display differently with XSL, or when validated against a preconfigured schema.
  • The Drupal Content Construction Kit: Add custom fields to nodes using a web browser.
  • TEI Roma: Try a web-based tool for generating custom XML schemas.
  • TEI's SourceForge repository: Download the TEI stylesheets.
  • The Stylebear: XSL stylesheet maker: Experiment with a web interface that creates a customized implementation of the Consortium's TEI stylesheets.
  • RELAX NG: Get a schema language for XML.
  • FileZilla: Get an open source FTP client program.
  • Apache web server: Get the open-source HTTP server that provides HTTP services observing the current HTTP standards and works on modern operating systems including UNIX, Microsoft Windows, Mac OS/X and Netware.
  • MySQL: Try a popular open source database.
  • DB2 Express-C: Get a free version of the IBM DB2 database server, an excellent foundation for application development for small and medium business.
  • IBM product evaluation versions: Download or explore the online trials in the IBM SOA Sandbox and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Open source, Web development, Industries
ArticleID=623175
ArticleTitle=Developing Drupal publications to support standards-based XML
publish-date=02082011