Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Developing Drupal publications to support standards-based XML

Customize your Drupal installation to support the publication of TEI (or other) XML documents

Garrick Bodine (garrick.bodine@gmail.com), Information Technology Manager, Penn State University
Garrick Bodine is an Information Technology Manager in the Office of Undergraduate Admissions at Penn State University.
Stephanie Schlitz (sschlitz@gmail.com), Linguistics Professor, Bloomsburg University of Pennsylvania
Stephanie Schlitz is a Linguistics Professor at Bloomsburg University of Pennsylvania. She collaborates on several ongoing XML-based projects.

Summary:  Academic and corporate clients seeking digital journals or other types of web publications regularly require platforms that support standards-based XML. This tutorial explains how to customize a Drupal implementation to develop publications that enable editors, authors, and users to submit and edit content in standards-based XML, where the standard can be enforced using server-side validation settings. For illustrative purposes, the discussion references TEI XML, the markup standard in widespread use in academia.

Date:  08 Feb 2011
Level:  Intermediate PDF:  A4 and Letter (803 KB | 33 pages)Get Adobe® Reader®

Activity:  26553 views
Comments:  

Rendering XML content

Although you now have content in the system that you know adheres to the subset of elements that you have preselected in your validation schema, you have not yet told Drupal how to display the information. As it is, the content shows on the screen, but elements unknown to any visitors' web browsers are simply ignored. Some of the richer and more interesting metadata in the document (that's why we chose the TEI XML format to begin with, right?) is not represented on-screen in any discernible way to readers, though it is available if one views the source of the page.

CSS and XSLT

To convey as much of the content's metadata to the user as possible, predefined styles are often employed to represent some of the metadata visually to users who view the content in a web browser.

Two kinds of styling are available to provide additional information to clients viewing the documents on your site:

  • XSLT—An XML technology that can take XML content as input and transform it into different XML for output based on custom rules provided in the stylesheet. XSLT can be client-side (that is, the data is transformed on the user's computer in his or her web browser) or server-side (that is, the data is transformed on the web server before it is passed to the client).
  • CSS—a client-side technology, CSS provides information to the browser of a web user. The CSS information tells how to represent elements in a page's source code that meet certain criteria when the browser displays a web page to a user.

CSS and XSLT are not mutually exclusive and, in fact, are regularly used together. We employ both in our site to represent TEI XML documents more richly than those that are provided by the default installation and configurations.

The XSLT functionality we employ is provided by the XML Content module, which is configurable in a manner analogous to the configuration of its schema validation: Create a custom XSLT stylesheet, upload it to the site, and enable it in the input filter configuration area (Administer > Site configuration > Input formats > TEI XML). Provide the stylesheet filename, myTei.xsl, and path in the XSLT Script File Path field (see Figure 15).


Figure 15. Configure a stylesheet in the XML Content module
Screen capture of configuring a stylesheet in the XML Content module

The TEI stylesheets

You can use any valid XSL stylesheets to render your XML content with the XML Content module. The structure, elements, and attributes defined and allowed by the XML schema you created and implemented with the XML Content module ensure that the document structure is known and predictable. This quality is critical in creating a stylesheet that renders the display version of the document on your site appropriately.

If you aren't familiar with XSL stylesheets, or perhaps even if you are, you might wish to start with a base model for transforming the TEI document using XSLT. The TEI Consortium maintains a rather large set of XSL stylesheets designed specifically for transforming TEI documents into other formats, from different flavors of (X)HTML to Open Office and Microsoft® Word formats. Of course, for the purposes of this tutorial, we focus on transforming the TEI input document to XHTML output that is integrated into a Drupal site.

The TEI stylesheets are, in fact, a collection of numerous, linked individual XSL stylesheets containing hundreds of functions for transforming and rendering most of the expected structure and elements found in standard TEI document implementations.

In addition to providing downloadable versions of the TEI stylesheets, the Consortium also provides and maintains a web interface called Stylebear that creates a customized implementation of the Consortium's TEI stylesheets. It creates a "master" stylesheet for the collection that contains values for variables used throughout the stylesheets.


Creating and customizing the TEI stylesheets

Upon visiting the Stylebear site, you can find a web form with many, many fields. Links above each section point to the section's documentation. At a glance, you likely are able to get a sense of the immense flexibility and comprehensiveness of both TEI and the TEI stylesheets. For the purposes of this tutorial, we do not cover Stylebear in great detail because it is designed to create stand-alone output versions of TEI documents (and we, by contrast, implement a system to integrate and publish many TEI documents in an existing website with a built-in content management system and all its attendant features).

Figure 16 provides a screen capture of the information we entered to create a base set of stylesheets for us to work with. That information includes fields to enter department, homeLabel, homeURL, homeWords, institution, parentURL, parentWords, searchURL, alignNavigationPanel, bottomNavigationPanel option, feedback URL and other details.


Figure 16. Create a stylesheet
Screen capture of Standard page features for the myTei.xsl stylesheet created in Stylebear

Look through the other sections of Stylebear to see the other options afforded by the tool, including options to insert your own JavaScript, CSS, or HTML elements into the output to be created by the Stylebear-modified TEI stylesheets (in Section 14, Hooks). You can also adjust textual content, such as a copyright message (in Section 9, Internationalization), for example.

The output file from Stylebear is named myTei.xsl by default. Keep this name and upload the file to your web server. Once again, the default directory location for our custom files used by the XML Content module (that is, the XML schema and the XSL stylesheet, but not the XML content) is [DRUPAL_HOME]/sites/all/modules/xmlcontent, and this is the location that you use. The myTei.xsl file simply contains default values for variables it supplies to the stylesheets that it imports from the TEI website. For performance, security, and stability reasons, download a copy of the complete sets of stylesheets from the TEI website and install them in your own Drupal instance.


Installing the TEI stylesheets

You can download the TEI stylesheets from the TEI's SourceForge repository. After you download the stylesheets, uncompress them and upload them to your Drupal installation. We recommend that you keep them in the same location as the other XML-related files (such as your schema and custom stylesheet, [DRUPAL_HOME]/sites/all/modules/xmlcontent), and that's the practice we follow here. We named the directory on our server that contains all of the .xsl files in their respective nested directories Stylesheets.

Because the TEI stylesheets files we use reside on our server, we need to modify myTei.xsl with a relative link to the copy of them on our website instead of pointing to the Consortium's site. Although this can be done within the Stylebear tool, we usually reserve the change until after we've located the place on our server that we'd like to keep them. We also make a couple of other changes to the XML lines noted in the content that follows. The first few lines of the myTei.xsl document produced by Stylebear are provided in Listing 1.


Listing 1. Modify the stylebear stylesheet

<xsl:stylesheet 
    xmlns:tei="http://www.tei-c.org/ns/1.0"
    xmlns="http://www.w3.org/1999/xhtml"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    exclude-result-prefixes="#default tei xsl"
    version="1.0">
<!-- XSLT stylesheet to generate HTML version of TEI document.
Written by the TEI XSL generator (Sebastian Rahtz, sebastian.rahtz@oucs.ox.ac.uk)
Created on 28 Nov 2010-->
<xsl:import href="http://www.tei-c.org/release/xml/tei
/stylesheet/xhtml2/tei.xsl"/>

As shown in Listing 1, we change the import element to point to our local versions of the stylesheets we downloaded by modifying the href attribute using a relative path to point to the master TEI stylesheet—that is, the stylesheet that imports all the other needed stylesheets from the bundle downloaded from the TEI Consortium: <xsl:import href="Stylesheets/xhtml/tei.xsl"/>.


Modifying the XSLT version attribute

At this point, you might have noticed that in addition to changing the filepath in the href attribute from remote to local, we also changed the xhtml2 to xhtml. If you run the default output of a stylesheet created with Stylebear, you are likely to see many errors in the resulting output page. By default, Stylebear takes advantage of many new features in XSLT 2.0 (thus the xhmtl2 moniker for the default set of stylesheets and the version number of 2.0); however, the PHP XML libraries employed by Drupal and the XML Content module do not support them, at least not at the time this tutorial was written. A couple of quick and minor modifications in myTei.xsl, though, can rectify the situation:

  1. Modify the version attribute in the first element (xml:stylesheet) to read 1.0 instead of 2.0.
  2. As noted, the path in the address whence the stylesheet imports the rest of the XSL rules content should also be changed from /xhtml2/tei.xsl to /xhtml/tei.xsl to tell the stylesheet to import the XSLT 1.0 version of the TEI stylesheets (also provided by the TEI Consortium in the downloaded bundle) instead of the default XSLT 2.0 version.

After you install, modify appropriately, and enable the stylesheets, all of the content that is entered by way of the TEI XML input filter that you defined are transformed from the native TEI XML input into an XHTML version that web browsers can display and interpret more readily.


Getting valid XHTML

Although doing all of the above should rid the resulting output page of any displayed processing errors, and might even render the page without visible problems, it likely does not result in valid XHTML on the page without further modification.

As mentioned, the TEI stylesheets that we have employed are made to be stand-alone, yet we use them to render only a portion of a complete HTML page. In other words, the TEI stylesheets are designed to output a complete (X)HTML document given TEI XML input, yet in our case, most of the page rendered on our website is being created by our content management system, Drupal, and only the custom content portion (node body in Drupal terminology) within the page is transformed and rendered using the XSLT stylesheets. This approach results in a full HTML document appearing inline within the HTML document created by Drupal. Because an HTML element is not allowed to be nested within another HTML element directly in XHTML, our site page fails validation until we modify our stylesheets to create content only from our content, not the surrounding boilerplate material required of a stand-alone (X)HTML document.

To avoid this issue, you must make some adjustments to the stylesheets, including some overriding functions in the main XML document. Most of the HTML elements placed around the content by the TEI stylesheets are created when the stylesheet first encounters the main TEI element. Therefore, we override that functionality by providing a new XSLT template that calls only the child templates that the original called (so the cascading functionality provided by that element is still present), but the surrounding tags are suppressed.

You can find the original template functionality in textstructure.xsl in the XHTML directory of the TEI stylesheets, but Listing 2 shows what we use to override it by including it after the xsl:import in the myTei.xsl stylesheet.


Listing 2. Override stylesheet functionality

<xsl:template match="tei:TEI">
    <xsl:call-template name="teiStartHook"/>
    <xsl:call-template name="javascriptHook"/>
    <xsl:call-template name="bodyHook"/>
    <xsl:call-template name="bodyJavascriptHook"/>
    <xsl:call-template name="startHook"/>
    <xsl:call-template name="simpleBody"/>
    <xsl:call-template name="stdfooter"/>
    <xsl:call-template name="bodyEndHook"/>
    <xsl:call-template name="teiEndHook"/>
</xsl:template>

You can probably see that now when the stylesheet encounters the root TEI element of the TEI document, it simply calls the other XSL templates and hooks linked in the stylesheet bundle, but does not create an entirely new HTML document.

You might wish to remove some of the other functions, depending on the documents and TEI features that you work with in your source documents. Although an in-depth discussion of all of the features within the TEI stylesheets is well outside the scope of this tutorial, the TEI stylesheets are well documented internally and on the TEI website.


Using custom XSL stylesheets

Using the TEI Consortium-provided TEI stylesheets is by no means a requirement and, in many circumstances, might not be the best or most expedient method. You might wish to design and implement your own XSLT stylesheets, particularly if the TEI documents you publish contain a small or very predictable subset of TEI elements. To implement custom stylesheets, you can simply create an XSL stylesheet, upload it to your server, and adjust the path of the XML Content module to the stylesheet in the configuration section.


Advantages of using the TEI stylesheets

As you've seen, it is easy to override the XSL functions contained in the stylesheets bundle without losing the base functionality provided from years of updates and improvements by the stylesheets' maintainers. In addition to basic conversion of TEI-specific elements to loosely corresponding or representative HTML elements, the stylesheets often provide good visual renderings and even assign class attributes to the HTML elements that provide yet another method for you easily to customize the visible aspects of the underlying TEI markup with a technology that is usually considered easier to master than XSL, and in fact, one that you might already know: CSS.


Using CSS to style and render the published TEI content

Like most modern website design frameworks, Drupal themes make extensive use of CSS to provide the visible variation in the site, from typography and font matters to page layout and image placement.

CSS accomplishes this by using selectors. Selectors are aspects of elements on the page that the stylesheet uses to identify page items to render in specific ways. They can be element names themselves (such as p or div), or they can be attributes (such as class or id). Although CSS is also outside the scope of this tutorial, it would be well worth the time invested to check out a tutorial devoted to it if you are not already familiar with the technology (see Resources for a link).

The important point here is that the TEI stylesheets provide selectors for many of the common TEI elements by assigning a class attribute to a span element that surrounds the content of a corresponding TEI element. This probably sounds more confusing than it is. Let's look at an example.

The following is a relatively common element and construction in a TEI document: <foreign>sic</foreign>.

The output from the TEI stylesheets results in the following XHTML: <i><span class="foreign">sic</span></i>.

You can see that not only does it provide a basic rendering for the TEI XML <foreign> tag (it italicizes it using the <i> xhtml tag), it also wraps it in a <span> with an attribute that contains the name of the TEI element. Although by default, the <span class="foreign"> won't cause any difference in the browser display, the construction has provided you with a selector that you can use to control the style for all <foreign> elements within your published TEI document through your CSS stylesheet.

Your CSS stylesheet can have any number of items added to style the desired TEI items so that you can display any aspect of the metadata contained in the TEI document in a way that you feel is appropriate.

Now that you've considered how to use XSLT and CSS to render your XML documents, you're ready to publish.


Publishing XML documents

At this point, because you have installed and configured the necessary modules in Drupal, created a new TEI XML Document content type, configured TEI XML as an input type, and configured a schema and stylesheet within the XML Content module, you are ready to publish.

Now, when a richly encoded XML document is added in the document body field (as shown in Figure 17) and the document is published (as shown in Figure 18), the data and metadata marked up in the XML document is effectively rendered.


Figure 17. Add an XML document in the XML document body field
Screen capture of  an XML document in the XML document body field

Figure 18 shows the published document.


Figure 18. Image of rendered, human readable text published in Drupal
Screen capture of a rendered, human readable text published in Drupal

Because we created and configured the schema and stylesheet to be used with the XML documents published on our site, we defined the XML elements and attributes accepted by our site. When an XML document is submitted, it is automatically validated for well-formedness and for schema compliance. And when a document is submitted, validated, and accepted for publication, our stylesheet prescribes its display format.

Our Drupal customization also affords us crucial levels of flexibility and granularity in determining which Drupal node types are to be written in TEI XML and which of users of our site can (and can't) input in XML.

5 of 8 | Previous | Next

Comments



static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Open source, Web development,
ArticleID=623175
TutorialTitle=Developing Drupal publications to support standards-based XML
publish-date=02082011
author1-email=garrick.bodine@gmail.com
author1-email-cc=nancy_hannigan@us.ibm.com
author2-email=sschlitz@gmail.com
author2-email-cc=nancy_hannigan@us.ibm.com