Case-insensitive enumerations

A simple, automated solution for allowing both upper- and lowercase letters

IBM's own XML ace Doug Tidwell offers one curious reader an automated solution for defining a case-insensitive enumeration that's straightforward, standards-compliant, and requires little work on the developer's part. Several code samples are included.

Share:

Doug Tidwell, Senior Programmer, EMC

Senior Programmer Doug Tidwell is IBM's evangelist for Web Services. He was a speaker at the first XML conference in 1997, and has been working with markup languages for more than a decade. He holds a Bachelors Degree in English from the University of Georgia and a Masters Degree in Computer Science from Vanderbilt University. He can be reached at dtidwell@us.ibm.com. You can also see his Web page at ibm.com/developerWorks/speakers/dtidwell/.



01 October 2002

Also available in Japanese

Here at developerWorks, we're always trying to answer your questions and meet your needs. Recently I received the following letter from Tommy Jones of Des Moines, Iowa:


Dear Regis,

Is there any way to do case-insensitive enumerations in XML Schemas? If the valid values for an element are "red," "blue," and "green," we'd like to let our users use any combination of upper- and lowercase letters for those values. We can't find any way in the XML Schema spec that we can define an enumeration that is case-insensitive. Can you help us?

Sincerely,
Tommy Jones of Des Moines, Iowa

Well, Tommy, I've got good news and bad news. The bad news is that you can't do what you want with XML Schema; the good news is that we have an automated solution that's standards-compliant, fairly simple, and shouldn't require any work on your part.

Getting started

First of all, you can't do what you want directly. The way around this problem is to convert the enumerations into a regular expression. Let's say that your schema defines the following datatype:

<xsd:element name="favoriteColor">
  <xsd:simpleType>
    <xsd:restriction base="xsd:string">
      <xsd:enumeration value="red"/>
      <xsd:enumeration value="blue"/>
      <xsd:enumeration value="green"/>
    </xsd:restriction>
  </xsd:simpleType>
</xsd:element>

To do a case-insensitive comparison, you need to convert this into a regular expression that combines all of the valid values. For the value "blue," for example, you'll create a regular expression that says, "This is an upper- or lowercase B, followed by an upper- or lowercase L, followed by an upper- or lowercase U, followed by an upper- or lowercase E." That means the enumerated datatype above should look like this:

<xsd:element name="favoriteColor">
  <xsd:simpleType>
    <xsd:restriction base="xsd:string">
      <xsd:pattern value="((B|b)(L|l)(U|u)(E|e)) | 
                          ((G|g)(R|r)(E|e)(E|e)(N|n)) | 
                          ((R|r)(E|e)(D|d))"/>
    </xsd:restriction>
  </xsd:simpleType>
</xsd:element>

This regular expression matches "blue,""BlUE,""bLUe," and any other combination of upper- and lowercase letters that spell the word blue. (You could also solve this problem by generating a set of <xsd:enumeration> elements that define all the combinations of upper- and lowercase letters, but that would be much larger than the regular expression, especially if the valid values were long strings.)


Even better news

Because an XML Schema is itself an XML document, you can write a style sheet that converts the enumeration markup into the regular expression you just looked at. To do this, you need to find all of the <xsd:restriction> elements that are based on the xsd:string datatype and contain <xsd:enumeration> elements. What you want is a style sheet that copies all of the existing schema except the <xsd:restriction> elements you're looking for. You'll then add a rule that defines how to transform the <xsd:enumeration> elements.

Here's a style sheet that defines the basic rule for copying an XML document. This will be the default rule used for everything in the source document; in a minute, you'll add the rule for transforming the <xsd:restriction> elements.

<?xml version="1.0" ?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <xsl:template match="*|@*|text()|comment()|processing-instruction()">
    <xsl:copy>
      <xsl:apply-templates select="*|@*|text()|comment()| 
                                   processing-instruction()" />
    </xsl:copy>
  </xsl:template>

  <!-- Add the stuff that handles the enumerations here. -->

</xsl:stylesheet>

Now you just have to write the template that transforms the <xsd:restriction> elements. Here's the XPath expression that selects the elements:

<xsl:template match="xsd:restriction[@base='xsd:string']
                     [count(xsd:enumeration) > 0]">

If you're not familiar with XPath syntax, this tells the style sheet processor to select all of the <xsd:restriction> elements that have a base='xsd:string' attribute and contain at least one <xsd:enumeration> element. The algorithm you'll follow for each <xsd:enumeration> element inside the <xsd:restriction> element is:

  1. Write a left parenthesis.
  2. Write the upper- and lowercase values of each letter.
  3. Write a right parenthesis.
  4. If this isn't the last <xsd:enumeration>, add a vertical bar.

Here's how that part of the style sheet looks:

<xsl:template match="xsd:restriction[@base='xsd:string']
                     [count(xsd:enumeration) > 0]">
  <xsd:restriction base="xsd:string">
    <xsd:pattern>
      <xsl:attribute name="value">
        <xsl:for-each select="xsd:enumeration">

          <!-- Step 1. Write a left parenthesis -->
          <xsl:text>(</xsl:text>

          <!-- Step 2. Write the upper- and lowercase letters -->

          <!-- Step 3. Write a right parenthesis -->
          <xsl:text>)</xsl:text>

          <!-- Step 4. If this isn't the last enumeration, write -->
          <!-- a vertical bar -->
          <xsl:if test="not(position()=last())">
            <xsl:text>|</xsl:text>
          </xsl:if>

        </xsl:for-each>
      </xsl:attribute>
    </xsd:pattern>
  </xsd:restriction>
<xsl:template>

You might have noticed that this step skips over the difficult step of writing out the upper- and lowercase values of each letter. You'll use tail recursion and the XSLT translate() function to do this.

Tail recursion is a common technique in XSLT style sheets. You'll use a named template to handle this; the named template will invoke itself until all of the letters in the string have been processed. The template (named case-insensitive-pattern in the example) receives two parameters: the string you're converting to a regular expression, and the position in the string where you should start. Here's how your named template begins:

  <xsl:template name="case-insensitive-pattern">
    <xsl:param name="string"/>
    <xsl:param name="index"/>

For any given string, the correct value is the concatenation of:

  1. The value of the current letter, written in the (A|a) format.
  2. The value of the remaining letters written in the (A|a) format. (If there are no letters left, the value is empty; otherwise, you call the template recursively. To do that, you pass the original string and increment the starting position by one.)

You'll create two variables representing the two values above, then you'll use the <xsl:value-of> element to output their combined values. For the current letter, you output a left parenthesis, the uppercase value of the letter, a vertical bar, the lowercase value of the letter, and a right parenthesis. Here's the markup that calculates the first variable:

<xsl:variable name="current-letter">
  <!-- Write a left parenthesis -->
  <xsl:text>(</xsl:text>

  <!-- Convert the current letter to uppercase -->
  <xsl:value-of select="translate(substring($string, $index, 1), 
                                  'abcdefghijklmnopqrstuvwxyz', 
                                  'ABCDEFGHIJKLMNOPQRSTUVWXYZ')"/>

  <!-- Write a vertical bar -->
  <xsl:text>|</xsl:text>

  <!-- Convert the current letter to lowercase -->
  <xsl:value-of select="translate(substring($string, $index, 1), 
                                  'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 
                                  'abcdefghijklmnopqrstuvwxyz')"/>

  <!-- Write a right parenthesis -->
  <xsl:text>)</xsl:text>
</xsl:variable>

A word about the XSLT translate() function

Before we go on, it's worth noting that the XSLT translate() function takes three strings. For each character in the first string, any letter that appears in the second string ('abcde...') is replaced by the corresponding letter from the third string ('ABCDE...'). So if the first string is bed, the function call translate('bed', 'abcde...', 'ABCDE...') returns BED. If a character in the first string doesn't appear in the second string at all, it isn't changed. That means translate('bed7', 'abcde...', 'ABCDE...') returns BED7. You could extend the strings in the function call to include accented characters used in Western European languages if you wanted. (The XSLT spec warns that translate() isn't sufficient to do case conversion in all the world's languages, so be aware of that.)

Now you calculate the value of all the remaining letters, each of them converted to the (A|a) format. If the index of the current letter is less than the length of the string, you invoke your named template again, passing the original string and incrementing the index by 1. If the index of the current letter is equal to the length of the string, this variable is an empty string.

<xsl:variable name="remaining-letters">

  <!-- If $index is less than the length of the string, -->
  <!-- call the template again. -->
  <xsl:if test="$index < string-length($string)">
    <xsl:call-template name="case-insensitive-pattern">

      <!-- The string parameter doesn't change -->
      <xsl:with-param name="string" select="$string"/>

      <!-- Increment the index of the current letter by 1 -->
      <xsl:with-param name="index" select="$index + 1"/>
    </xsl:call-template>
  </xsl:if>
</xsl:variable>

Finally, you output the value of the two variables with the <xsl:value-of> element and the concat() function. This is equivalent to a return statement in other programming languages.

<xsl:value-of select="concat($current-letter, $remaining-letters)"/>

So, if the values blue, red, and green, are valid, you can transform your schema with our style sheet to generate a new schema. Using that new schema, the values BLUE, Blue, bLuE, and blUE are all valid.


An example

Here's an example that illustrates how your style sheet works. You'll use a schema that defines enumerations for gender, marital status, and favorite color. Here's a sample instance document:

<?xml version="1.0"?>
<f:friend 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.ibm.com/developerWorks 
                    friend.xsd"
  xmlns:f="http://www.ibm.com/developerWorks">
  <f:name>
    <f:firstName>Jane</f:firstName>
    <f:lastName>Doe</f:lastName>
  </f:name>
  <f:gender>f</f:gender>
  <f:maritalStatus>married</f:maritalStatus>
  <f:favoriteColor>orange</f:favoriteColor>
</f:friend>

As part of this example, you're including a short piece of Java code, XMLValidator.java, that validates an XML document against an XML schema. If you enter java XMLValidator friend.xml, you'll see something like this:

> java XMLValidator friend.xml

Your document contains no errors!

In our sample document, the values f, married, and orange are all case-sensitive; entering F or Married or OrAnGE will cause errors. If you put those illegal values into friend.xml, you'll get messages like this:.

Error in friend.xml at line 10, column 25: cvc-type.3.1.3: The value 'F' of 
element 'f:gender' is not valid.
Error in friend.xml at line 11, column 45: cvc-type.3.1.3: The value 'Married' 
of element 'f:maritalStatus' is not valid.
Error in friend.xml at line 12, column 44: cvc-type.3.1.3: The value 'OrAnGE' 
of element 'f:favoriteColor' is not valid.

You can use our XSLT style sheet to convert the original schema into a new schema document.

> java org.apache.xalan.xslt.Process -in friend.xsd -xsl convert-enumerations.xsl 
-out insensitive-friend.xsd

Next, change the root element of the XML document to refer to this new schema file:

<f:friend 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.ibm.com/developerWorks 
                      insensitive-friend.xsd"
  xmlns:f="http://www.ibm.com/developerWorks">

If you run your validation program against the XML document now, you'll once again get the message that your document contains no errors. The file case-insensitive.zip has all the code and samples you need to try it yourself.

Well, Tommy, I hope this answers your question. Our solution is relatively simple, works automatically, and is based on XML standards.

Have questions of your own? Feel free to send 'em to us, and we'll try to answer them in our vast spare time.


Download

DescriptionNameSize
Code samplex-case/case-insensitive.zip---

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12166
ArticleTitle=Case-insensitive enumerations
publish-date=10012002