The Universal Business Language (UBL) defines a royalty-free library of standard electronic XML business documents such as purchase orders and invoices. UBL is designed to plug directly into existing business, legal, auditing, and records management practices. It helps eliminate re-keying data in existing fax and paper-based supply chains and provides an entry point into electronic commerce for small and medium-sized businesses.
All UBL business documents are formally defined using the W3C XML Schema Definition Language (XSD) and a single collection of components. Each schema defines a class of documents such as an invoice or purchase order. Since UBL is XML, normal XML rules also apply to all UBL documents. For introductory information about UBL, read my companion article (see Resources).
When companies exchange XML business documents, it is important to not only know you are dealing with a valid document; you need to also verify the value of certain information items such as country code, currency code, payment means, or part number. The lists of valid country and currency codes are examples of code lists managed internationally, but a supplier's list of valid part numbers used by a customer on a purchase order must be managed locally. In general, trading partners should be able to specify constraints on values and local business rules easily and in a simple but declarative way.
For any UBL business document to be XML it must, as a minimum, be well
formed. This means that all elements must be properly nested. For example,
is properly nested, while
is not. You may also check the structure and lexicon of the instance by
validating it against the schema. For example, if a UBL invoice is found
to be an instance of the class of documents
invoice, then it is also said to be valid.
UBL provides a means for an application to check the values of various elements and to customize this checking for the particular needs of trading partners. See the Resources to read an article that introduces UBL.
Code lists are sometimes known as a controlled vocabularies. The value of an element must be one in a finite list of standard unique pre-defined values. Who should be the authority for maintaining these values?
Some values have a global scope (for example, country code) and should be managed globally, while others have only a local scope and should be managed between trading partners or within a community of interest (for example, supplier product codes). This local type of list has no particular relevance outside that community of interest.
For example, a supplier of goods might maintain a standard list of product codes, where a customer can order one of these products by providing the appropriate product code. We can not guarantee uniqueness of these codes across communities of interest: the same product code may be used, but describe a completely different product in each case.
There are global code lists already managed by an international standards organization. These include lists such as currency codes managed by the International Organization for Standardization (ISO) (ISO 4217), country codes by ISO (ISO 3166-1), and payment means codes by United Nations Economic Commission for Europe (UNECE) (UNECE 4461). UBL makes use of these code lists where they exist, rather than re-inventing them.
It is entirely possible that a community of interest might only require a subset of a code list, or need to provide additional codes not currently in the standard list. UBL provides the means for both sub-setting and extending a standard code list for local purposes.
In traditional programming, structural, lexical, and value constraint checking is the responsibility of individual application programs. This requires a programming staff and a process to create and maintain software to do this.
In an XML implementation, structural and lexical checking has been standardized and precedes the application. An XML document must be well formed, and is then valid if and only if it conforms to a formal schema definition. This means that by the time an application receives the document, it is known to be structurally and lexically correct. This checking can be accomplished with standard components.
In business applications, it is often required to check the values of certain information elements to determine if they are valid. How do we incorporate this value checking in XML?
One way is to encode the values into the schema using schema declared enumerations. This is commonly done by business-oriented schemas, but these can be inflexible. For example, Listing 1 is a fragment of the United Nations Centre for Trade Facilitation and Electronic Business (UN/CEFACT) Core Components Technical Specification (CCTS) Currency Codes currently used in the UBL 2.0 schemas. They are being removed in the UBL 2.1 schemas precisely because they are inflexible.
Listing 1. Fragment of the UN/CEFACT CCTS Currency Codes
<xsd:simpleType name="CurrencyCodeContentType"> <xsd:restriction base="xsd:token"> <xsd:enumeration value="AED"> <xsd:annotation> <xsd:documentation> <ccts:CodeName>Dirham</ccts:CodeName> <ccts:CodeDescription></ccts:CodeDescription> </xsd:documentation> </xsd:annotation> </xsd:enumeration> <xsd:enumeration value="AFN"> <xsd:annotation> <xsd:documentation> <ccts:CodeName>Afghani</ccts:CodeName> <ccts:CodeDescription></ccts:CodeDescription> </xsd:documentation> </xsd:annotation> </xsd:enumeration> <xsd:enumeration value="ALL"> ... </xsd:restriction> </xsd:simpleType>
Here, we enumerate all the possible valid values that the
CurrencyCodeContentType can assume. The
annotation is used to document what the code means.
There are only two other code lists that were implemented as enumerated
lists in the UBL schema. These are MIME encoding identifiers in
BinaryObjectMimeCodeContentType and unit code
UnitCodeContentType. All three
enumerations are imported from UN/CEFACT-standardized sets of coded
values, and were considered complete and stable enough to include through
an enumeration until business requirements for UBL 2.1 revealed that they
were, in fact, not stable enough.
There is a risk involved with changing a code when it is defined in an
enumerated list within a schema. This requires the modification of the
standard schema to reflect this change. For example, country codes change
over time, such as when Czechoslovakia (country code:
CS) split into two countries, Czech Republic
CZ) and Slovakia
SK), and then the country code
CS was reused for Serbia and Montenegro. In
these cases, the receiving application requires instance level meta-data
such as a version number to disambiguate the country code value.
In the case of currency codes, the code list at the time of the release of
UBL 2.1 used the value
TRY for the Turkish
Lira, whereas the code list at the time of the release of UBL 2.0 used the
TRL for the same currency. A UBL 2.1
schema must validate all UBL 2.0 instances. If UBL 2.1 continued to use a
single enumeration for the currency, a UBL 2.0 instance using Turkish Lira
would not validate. Though union constructs are available for simple
enumerations, the CCTS specification for associated code list metadata
precludes their use.
These simple real-world business situations revealed the drawback of schema-based enumerations for the maintenance of code lists over time. The UBL 2.0 approach to non-schema-based enumeration code lists was extended in UBL 2.1 for all code lists and the schema enumerations were entirely abandoned.
Enumeration within a schema presents a problem for locally administered code lists. If a local code list was implemented as enumerations in a schema, the standard schema would then have to be modified for each particular use of the schema, leaving us with no standard schema. In addition, a globally defined information item might have different sets of values in different document contexts. A single enumerated set could not meet this requirement.
UBL took the rather unique approach to implement code lists external to the schema to be used in a two step process as shown in Figure 1. The structural and lexical validation is carried out in a standard way using the XSD schema. For example, an invoice might have been constructed to conform to UBL-Invoice-2.0.xsd. If valid, the instance is then validated for values. UBL provides a standard UBL 2.0 defaultCodeList.xsl file that can be used in this stage, but the end user can subset these lists, extend them, and add new code lists. If the instance is valid in this phase, the instance is then passed on to the application for further interpretation and processing. This gives users maximum flexibility in configuring and updating UBL code lists without changing the standard UBL schemas.
Figure 1. Two step validation process
UBL is designed to handle business documents in a global sense. If you examine trading partners across the globe you will find a myriad of requirements. You will also find that there can be vast differences between the requirements from one trading partner to another. There can be no standardization of a set of values representing business concepts that are particular to trading partners. Trading partners need to specify and implement their local requirements, but globally we need to keep everything under a UBL umbrella.
Trading partners also need to implement business rules. For example, one trading partner relationship might not allow orders greater than a certain monetary amount, or credit cards cannot be used for certain products. Again, partners need to implement business rules, but within an overall UBL umbrella. Schematron is one way to implement these business rules (see Resources for more information).
How might an application implement this two phase validation of UBL documents?
There is an out-of-the-box collection of open source tools that can be used to perform default validations of UBL 2.0 documents included in the val directory of the UBL release package. The default validations assume a Linux or Microsoft Windows® XP system with no currently installed XML or XSLT processing software. There are also test instances including a valid document, a document with a mis-spelled element name, and an instance that is structurally and lexically valid, but has an illegal code list value. The UBL 2.0 release package can be downloaded as a single zip archive (see Resources for a link).
The specification of code lists and the specification of the association of these code lists to a specific UBL document are accomplished via XML documents. Code lists are handled by genericode, while the association files are handled by context/value associations (CVAs).
The OASIS Code List Representation Technical committee has produced OASIS Genericode 1.0. that describes genericode as follows:
- A standard model and XML representation for the contents of a code list.
- A standard model and XML representation for data associated with items in a code list.
CVA files allow for local customization within trading partners. Some of the things trading partners might want to do based on their local business needs are:
- Add information entities.
- Omit optional information entities.
- Refine the meaning of an information entity.
- Combine or re-combine and assemble information entities.
- Subset the document model — restricting the number of information entities in a document.
- Constrain the document content — restrict the possible values an information entity can have.
- Create constraints on possible values for information entities such as coded lists.
- Add business rules.
See the Resources section for a link to the entire set of OASIS Guidelines for accomplishing such customizations. If you refer to Figure 1, you see that value validation is carried out with an XSLT process. In UBL, the stylesheet required to accomplish this task is generated programmatically.
Listing 2. Genericode version of
<?xml version="1.0" encoding="ISO-8859-1"?> <gc:CodeList xmlns:gc="http://genericode.org/2006/ns/CodeList/0.4/"> <Identification> <ShortName>CurrencyCode</ShortName> <LongName xml:lang="en">Currency</LongName> <LongName Identifier="listID">ISO 4217 Alpha</LongName> <Version>2001</Version> <CanonicalUri>urn:un:unece:uncefact:codelist:specification:54217 </CanonicalUri> <CanonicalVersionUri>urn:un:unece:uncefact:codelist:specification:54217:2001 </CanonicalVersionUri> <LocationUri>http://docs.oasis-open.org/ubl/os-ubl-2.0/cl/gc/cefact /CurrencyCode-2.0.gc</LocationUri> <Agency> <LongName xml:lang="en">United Nations Economic Commission for Europe </LongName> <Identifier>6</Identifier> </Agency> </Identification> <ColumnSet> <Column Id="code" Use="required"> <ShortName>Code</ShortName> <Data Type="normalizedString" xml:lang="en"/> </Column> <Column Id="name" Use="optional"> <ShortName>Name</ShortName> <Data Type="string" xml:lang="en"/> </Column> <Key Id="codeKey"> <ShortName>CodeKey</ShortName> <ColumnRef Ref="code"/> </Key> </ColumnSet> <SimpleCodeList> <Row> <Value ColumnRef="code"> <SimpleValue>AED</SimpleValue> </Value> <Value ColumnRef="name"> <SimpleValue>Dirham</SimpleValue> </Value> </Row> <Row> <Value ColumnRef="code"> <SimpleValue>AFN</SimpleValue> </Value> <Value ColumnRef="name"> <SimpleValue>Afghani</SimpleValue> </Value> </Row> <Row> <Value ColumnRef="code"> <SimpleValue>ALL</SimpleValue> </Value> <Value ColumnRef="name"> <SimpleValue>Lek</SimpleValue> </Value> </Row> ....
In Listing 2,
Identification gives some meta data about this
ColumnSet defines two columns,
indicates that the column whose ID is
be used as a key value into the code list.
SimpleCodeList then defines the content of the
code list where each row contains the code value and its corresponding
Note that the definition of this code list is an XML document. It has a W3C Schema, which means that this code list file can be authored and validated like any other XML file.
OASIS provides a specification for a default CVA file:
"This Committee Specification 01 specifies an XML vocabulary used to express the relationship between information items found in a structured hierarchy (such as the XML instance of a business document) and two kinds of constraints imposed on those items. One kind of constraint is that the value is a member of a set of controlled vocabularies of enumerated values (such as code lists or identifiers). Another kind of constraint is an arbitrary evaluation of a boolean query expression (such as a nonenumerable code list value check (say, the checksum calculation of an ISBN number), a business rule or a superimposed lexical constraint such as a maximum string value length)." (See the link to the OASIS Committee Specification 01 in Resources).
Listing 3 is a code snippet from a CVA file to
illustrate the various features of a CVA file for
PaymentMeansCode. Note that it is an XML file
with a schema, so it can be authored and validated like any other XML
file. In this file, we associate the various occurrences of
PaymentMeansCode in a UBL document with the
correct code list to validate its value.
Listing 3. Various features of a CVA file
<?xml version="1.0" encoding="UTF-8"?> <cva:ContextValueAssociation xmlns:cva="http://docs.oasis-open.org/codelist/ns /ContextValueAssociation/1.0/" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd :CommonBasicComponents-2" xmlns:x="http://www.w3.org/1999/xhtml" name="UBL-QualifiedDataTypes-2.0" version="2010-08-30 20:21:05(UTC)" id="urn:oasis:names:specification:ubl:cva:UBL-Qualified -Data-Types:2"> <Annotation> <Description> <x:p> This describes all of the qualified supplementary components and business information entities for the OASIS Universal Business Language (UBL) 2.0 vocabulary. </x:p> <x:p> In UBL 2.1 qualified data types are anonymous (in that they are not expressly named or identified) and UBL information items with qualifications are addressed in this document and associated with their respective value qualifications. </x:p> <x:p> At this time all value qualifications are Code Lists, which are a type of value list (the other type being identifier lists). Instance metadata for UBL is described by the UN/CEFACT Core Component Technical Specification (CCTS) Version 2.01. The document contexts are all of the supplementary components and business entities in a UBL instance that have qualified values. </x:p> </Description> </Annotation> <Title> UBL 2.0 and UBL 2.1 qualified information items </Title> <ValueLists> <Annotation> <Description> <x:p> These list all of the genericode files of Code Lists used by UBL information items whose values are qualified by Code Lists. </x:p> <x:p> The unique identifier <x:samp>xml:id=</x:samp> is used later in this file when describing the context for each entity that has a qualified value. </x:p> <x:p> The URI value points to the genericode file associated with the identifier. The URI is hyperlinked in this report to a rendering of the contents of the genericode file that follows the rendering of the contexts. </x:p> </Description> </Annotation> ... <ValueList xml:id="PaymentMeans-2.0" uri="../../os-UBL-2.0/cl/gc/default/PaymentMeansCode-2.0.gc"/> <ValueList xml:id="PaymentMeans-2.1" uri="../cl/gc/default/PaymentMeansCode-2.1.gc"/> ... </ValueLists> <Contexts> <Annotation> <Description> <x:p> The contexts in which various items with a qualified value are specified. </x:p> <x:p> The <x:samp>address=</x:samp> attribute is an XPath address that satisfies all of the UBL items in the instance that are qualified with the indicated value lists using the indicated associated set of instance-level metadata items. </x:p> <x:p> Each context identifies the collection of supplementary components (by its identifier) where instance-level metadata is found, then all value lists (by their identifier) that apply to the item being addressed. </x:p> <x:p> The contexts are listed first for attribute items and then for element items. </x:p> </Description> </Annotation> ... <Context values="PaymentMeans-2.0 PaymentMeans-2.1" metadata="cctsV2.01-code" address="cbc:PaymentMeansCode"/> ... <Context values="Currency-2.0 Currency-2.1" metadata="cctsV2.01-code" address="cbc:DocumentCurrencyCode | cbc:TaxCurrencyCode | cbc:PricingCurrencyCode | cbc:PaymentCurrencyCode | cbc:PaymentAlternativeCurrencyCode | cbc:RequestedInvoiceCurrencyCode | cbc:SourceCurrencyCode | cbc:TargetCurrencyCode | cbc:CurrencyCode"/> ... </Contexts> </cva:ContextValueAssociation>
Note that, in the
ValueLists section, there are
two entries for
PaymentMeans, each with a
different version number (2.0 and 2.1). This allows a reference to
information based on the versions of UBL.
In the context list for the
cbc:PaymentMeansCode means that this
reference is for
PaymentMeansCode anywhere it
appears in the UBL document (both 2.0 and 2.1).
It is possible to refer to specific locations in the UBL document or
multiple locations. For example, we have seen the example of currency. It
is associated with the union of various currency codes
Listing 4. List of codes for
<?xml version="1.0" encoding="UTF-8"?> <gc:CodeList xmlns:gc="http://docs.oasis-open.org/codelist/ns/genericode/1.0/"> <Identification> <ShortName>PaymentMeansCode</ShortName> <LongName xml:lang="en">Payment Means</LongName> <LongName Identifier="listID">UN/ECE 4461</LongName> <Version>D03A</Version> <CanonicalUri>urn:oasis:names:specification:ubl:codelist:gc: PaymentMeansCode</CanonicalUri> <CanonicalVersionUri>urn:oasis:names:specification:ubl:codelist:gc: PaymentMeansCode-2.0-update</CanonicalVersionUri> <LocationUri>http://docs.oasis-open.org/ubl/os-UBL-2.0-update/cl/gc /default/PaymentMeansCode-2.0.gc</LocationUri> <Agency> <LongName xml:lang="en">United Nations Economic Commission for Europe</LongName> <Identifier>6</Identifier> </Agency> </Identification> <ColumnSet> <Column Id="code" Use="required"> <ShortName>Code</ShortName> <Data Type="normalizedString"/> </Column> <Column Id="name" Use="optional"> <ShortName>Name</ShortName> <Data Type="string"/> </Column> <Key Id="codeKey"> <ShortName>CodeKey</ShortName> <ColumnRef Ref="code"/> </Key> </ColumnSet> <SimpleCodeList> <Row> <Value ColumnRef="code"> <SimpleValue>1</SimpleValue> </Value> <Value ColumnRef="name"> <SimpleValue>Instrument not defined</SimpleValue> </Value> </Row> <Row> <Value ColumnRef="code"> <SimpleValue>2</SimpleValue> </Value> <Value ColumnRef="name"> <SimpleValue>Automated clearing house credit</SimpleValue> </Value> </Row> ... <Row> <Value ColumnRef="code"> <SimpleValue>20</SimpleValue> </Value> <Value ColumnRef="name"> <SimpleValue>Cheque</SimpleValue> </Value> </Row> ... </SimpleCodeList> </gc:CodeList>
If you examine this list in detail, you will find a wide array of payment means. Since UBL is designed to be used globally for both local and international trade, it has to cover all possible business requirements. Again, it is possible to subset this list for local business requirements. Not everyone is comfortable with these angle bracket files. Fortunately, they are all XML files, which means we could use XSLT to transform them into HTML, or use XSL-FO to transform them into formal print or PDF files (see Resources).
For example, using an XSLT style sheet, you could transform the genericode
file above into a readable HTML report that lists all of the XML elements,
attributes, and values along with the associated XML files, such as the
code lists. To see the HTML output of this transformation for the
genericode file above, see the "Contexts" entry in Resources. In this report, you will see that
PaymentMeansCode shows up asListing 5.
Listing 5. Output
address="cbc:PaymentMeansCode" instance metadata set: cctsV2.01-code value list: PaymentMeans-2.0 (detail) value list: PaymentMeans-2.1 (detail)
(detail) are hot links to the appropriate part
of the report allowing quick exploration of the reports. And further into
the report, there is an HTML table detailing the 100 ways of making a
payment in UBL. The stylesheet used by the committee to create this report
is a free download from Crane Softwrights Ltd. (See Resources
As previously mentioned, the stylesheet used in the Phase 2 validation of values is generated programatically. In Figure 2, the assertion validation stylesheet (2) can be constructed from the context/value associations (CVA), the external value list expressions (GC), and the business rules (SCH). All of these are standard XML documents and can be processed via standard XML processing software. In fact, we can generate a standard XSLT stylesheet, which when processed against an XML document, can validate that the content of the document is valid and meets business rules.
Figure 2. Value valuation
In Figure 2, there are three types of inputs representing the business needs of the trading partners:
- Context/value associations (3), expressed in genericode, allow value validation based on the context of an information entity in the source XML document. The specific information entity is expressed via an Xpath expression.
- External value list expressons (4), expressed in genericode, the controlled vocabularies
- Business rules (5), expressed in Schematron
Many trading partners would like to establish local conditions that must hold true in specified situations. These are the business rules that apply to those specific trading partners.
In UBL, these can be specified using an ISO/IEC 19757-3 Schematron deployment. A Schematron file is simply a set of assertions about a UBL document that can be tested for validity. Schematron is expressed using an XML Schematron vocabulary. (See Resources for links to more Schematron information.)
Schematron is usually implemented using XSLT, although there is a Python implementation available. With the XSLT implementation, the XSLT stylesheet used in the second phase value validation is generated programmatically using the genericode files, CVA files, and the Schematron Files. This generated stylesheet, when applied against a valid UBL document, will report any value errors in the UBL file.
You have completed an introductory overview of the elements of code lists in XML documents using OASIS UBL as an example vocabulary and implementation. The UBL document validation includes not only the structural and lexical validation of the UBL document using W3C XSD, it also includes the value checking of content of various information elements in the document. Although the UBL release packages contain the default code lists that can be used, code list technology provides resources and guidelines to allow trading partners to customize these coded lists for their particular local needs and implement business rules as a layer on top of this. Although we have illustrated the code list technologies with UBL, they can be used in multiple domains with any XML documents, particularly XML business documents, where UBL is but one example. The New Zealand Ministry of Education and FpXML are two examples of the use of code list technologies.
UBL and Disruptive Innovation: Out-of-the-box business XML for
multiple industries (Hugh Chatfield, developerWorks, November
2010): Read the companion article for a good introduction to UBL, its
value for creating business documents, some examples of its use, and an
explanation of why it is a disruptive innovation.
UBL 2 Guidelines for Customization, First Edition: Find the
entire set of OASIS Guidelines for customization.
Context/value association using genericode 1.0: Committee
Specification 1.0: Read the entire OASIS specification for
Hands-on introduction to Schematron (Uche Ogbuji, developerWorks,
September 2004): Read this article for a good overview of Schematron.
OASIS Universal Business Language (UBL) Technical Committee
OASIS Code List Representation Technical Committee
Currency Code List: See the schema for CurrencyCodeContentType
including the code list values.
Appendix E. UBL 2.0 Code Lists and Two-phase Validation
(Informative): Read a short introduction to UBL two-phase
CVA file — HTML rendition: This is the default HTML
rendition of UBL 2.0 and UBL 2.1 qualified information items.
CVA for UBL: The complete CVA file for UBL.
Genericode implementations: Review a list of known
implementations of genericode.
Learn from the Frequently Asked Questions for UBL.
- PaymentMeansCode: Find the full list of codes for
PaymentMeansCode specified in a genericode file.
Contexts: Learn more from an example of an XML file transformed
(2003) — Core Components Technical Specification — Part
8 of the ebXML Framework: Review information to guide in the
interpretation or implementation of ebXML concepts.
Cover Pages: Learn more
about UBL from Cover Pages, a comprehensive, online reference collection
supporting the XML family of markup language standards.
Softwrights — UBL 2.0 Model Summary Reports: Access sets of
61 HTML files (each approximately 3MB compressed) summarizing the
information items and document model spreadsheets. Crane Softwrights Ltd.
is a consultancy delivering Computer Systems Analysis and training
services worldwide since April 1997.
OASIS UBL TC — UBL 2 Guidelines for Customization: Access
practical guidance in creating UBL-conformant and UBL-compatible document
— UBL 2.0 — sample UBL documents: Learn more from
these examples of UBL documents such as invoice, order, and order
Industry Zone: See the Industries site for all the latest
industry-specific technical resources for developers.
developerWorks podcasts: Listen to interesting interviews and
discussions for software developers.
developerWorks technical events and webcasts: Stay current with
developerWorks technical events and webcasts.
Get products and technologies
software: Innovate your next open source development project with
IBM trial software, available for download or on DVD.
blogs: Participate and get involved in the developerWorks
W. Hugh Chatfield is an Information Systems Professional and principal of CyberSpace Industries 2000 Inc., providing both XML training and consulting and multimedia production services. He holds an honors degrees in Physics and an honors certificate in Documentary Production. With over 40 years experience in Information Technology, he has spent the last 17 years in the general markup languages domain.