Does your industry provide a set of best practices for XML Schema to streamline industrywide data integration? If not, perhaps it should follow retail's lead. Since 1993, the Association for Retail Technology Standards (ARTS) of the National Retail Federation (NRF) has been developing a standard data model to help retailers integrate applications and interface point-of-sale (POS) data more easily.
The International XML Retail Cooperative (IXRetail) is the ARTS committee that is standardizing XML messages for exchange between IT systems that support retail stores. IXRetail has adapted names and definitions from the ARTS Data Model standard for use in XML messages. IXRetail has also worked on standardizing other aspects of XML technology across the retail industry and among its vendors.
This article was initially prepared for publication in two installments in NRF's STORES Magazine. Links to the online versions of those article installments are in the Resources section. The material from those installments has been collected here and revised for developerWorks readers.
XML provides the format for identifying information that applications need, but does not assure that the information needed by the recipient is provided. However, XML provides formatting structures that help obtain this assurance. The XML Schema language elaborates on XML and related specifications to provide a flexible way to describe a shared vocabulary of names that can be used to mark up XML documents. By using a shared schema, applications can use validating parsers to assure that appropriate information is sent or received.
IXRetail has chosen XML because of the universal applicability of XML to structured document and data exchange on the Web. XML and the XML Schema language were designed as general solutions that include facilities for almost every need. This has made XML Schema too general for use without additional constraints. For example, there are multiple ways to describe values from a set that may vary:
xs:choice, element substitution groups, abstract elements, abstract types, and many more. Each of these alternatives has different characteristics. In some cases, one alternative may be clearly the most appropriate. However, in many cases, several different alternatives could be chosen. This is not desirable: Arbitrary selection among suitable features of XML Schema can hide similarities in related messages or among similar types and can cause people who maintain or extend the message system to waste effort preserving what was arbitrarily chosen. In many cases, IXRetail adopted a "Best Practice" guideline to state a preference for using a specific feature of XML or the XML Schema language for specified situations.
The level of assurance that a schema can provide to an application depends on how well the schema translates application requirements into requirements on XML messages. The XML Schema language can describe most common constraints on components of XML messages. In some cases, you can have an optimal schema that validates every "good" message and rejects every "bad" message.
However, even when such an optimal schema is possible, you may want to let the application deal with some bad messages (such as when valid values change too dynamically for enumeration in a standardized schema). Further, some constraints cannot be described in the XML Schema language. As a result, validation of messages with XML schemas does not eliminate the need for applications to verify that input data is acceptable.
Just as messages may be good or bad depending on whether the application finds them acceptable, schemas may be considered good if acceptable by some criteria. The question is, which criteria?
Some criteria seem obvious. Tools that conform to the XML Schema language can process schema written in that language, which suggests that using the W3C specification for XML schemas should be a guideline. Things were not always so obvious, however, because the W3C formally adopted the XML Schema language only recently. Many people predicted the language would fail because it was complex, initially, and getting worse. This complexity, however, was necessary to deal with the wide range of different uses.
Criteria for "good" XML and XML Schema specifications may vary widely depending on how those specifications are used. Even when there is agreement to use the W3C specification for XML Schema, there may be no single set of criteria for "good" application of the standards to specifications; determining the "best" criteria can be even more uncertain. However, when you can restrict the particular environment of application, you can hope to determine -- by well-informed consensus -- the set of criteria that will lead to the best practice of using XML and related standards in that environment. In this case, the particular environment is data interchange between and among information technology applications that either support the operation of retail stores or integrate retail stores with the retail enterprise.
The best practices guidelines listed below have been approved by the IXRetail Technical Committee. The guidelines are listed in the order approved by the Best Practices Subcommittee of IXRetail; the order has no other significance. Each guideline is shown in bold and is followed by comments describing rationale for the guideline or related remarks from the authors of this article. However, only the guidelines were approved; IXRetail has not approved the commentary. The guidelines may evolve during use and additional guidelines are anticipated. ARTS developed these practices to guide its development of standardized XML schemas. ARTS published these practices to guide developers of retail applications until they are able to use the forthcoming ARTS standards. If you develop other kinds of applications, many of these guidelines can improve the consistency of your XML schema (just substitute the name of your specification approver for IXRetail).
- Use "UCC camel case" with no spaces or hyphens between words for all XML names assigned. This kind of camel case results in the capitalization of the first letter of each word of a compound name including capitalization of the initial letter of the name. This ensures that names are both legal for XML (no spaces) and more readable than single-case text. An example of a UCC camel case name is
InventoryControlDocument. (Some organizations have adopted naming standards that use LCC camel case, such as
initialLowerCasefor some kinds of names, typically attribute names. IXRetail decided against this. The decision to use UCC or LCC or both is not driven by XML Schema, which has distinct name spaces for elements, attributes, and types.)
- Readability is more important than tag length. Although IXRetail remains concerned that long tags will make XML documents impracticably long, it is important to help users choose the correct tag. For example,
POSDepartmentIDis preferred over
ID_DPT_POS. (It is also anticipated that "messaging infrastructure" will provide message compression.)
- With a few exceptions, abbreviations and acronyms should not be used in element, attribute, and type names. The exceptions are:
POSfor, respectively, Global Trade Item Number, Identifier, and Point of Sale. The logical view of the ARTS Data Model avoids abbreviations. Only the exceptions listed could be justified. This guideline is also considered desirable for names used for components of XML messages. (Clearly, the exceptions listed here are specific to the retail industry; other industries are likely to permit other abbreviations.)
- Remove entity names from attribute names where possible. In the ARTS Data Model, the entity name is often used as a prefix for attribute names. This makes importing foreign keys easier in relational database models. However, the hierarchical structure of an XML message eliminates ambiguity and makes repeating the entity name unnecessary. Thus, repetition of entity names needlessly increases tag lengths. Although this guideline uses the entity and attribute terminology of the ARTS Data Model, it applies to both element and attribute names of XML; Data Model attributes can correspond to either elements or attributes in XML messages. (Guideline 8 is related and is a generalized version of this guideline.)
- Use W3C specification for XML schemas instead of DTDs or alternative schema languages. XML Schema allows local element names, but DTDs require that all element names be globally unique. (The potential for automated parsing with open-source validating parsers also influenced adoption of this guideline.)
- Enumeration values should use names only (not numbers) and the names used for enumeration values must conform to the guidelines for element or attribute names. If suitable names already exist, they should be used (instead of IXRetail creating new names). Prefer ISO standards to national standards or consortium specifications. Names composed of natural language words can suggest the meaning of the value. Numbered enumerations invite nonstandard extensions that do not interoperate. (A criticism of this guideline is that the requirement to use names forces a choice of natural language. The language chosen for these names should be the one most helpful to those who maintain and extend the messaging system. However, these names should be limited to differentiating information handled differently by the information technology system; users should always be presented with messages in each user's chosen language. This guideline is not an excuse to avoid good user interfaces.)
- Enumeration values should use names consisting of English words. Names based on a natural language can suggest meanings by appropriate selection of words, but numeric values do not. It is helpful to be consistent with names derived from the ARTS Data Model, which uses English words. The words displayed to end users for elements, attributes, and enumerated values need to be chosen for usability by end users and will probably need to be translated even for English speakers. Only programmers who do system debugging should be expected to deal directly with XML messages. (Some industries may not need a guideline such as this or may choose a different language. However, not making a choice may lead to use of cryptic "words" that are no more helpful than arbitrary numbers.)
- Names should not include a repetition of the names of containing structures. The container provides adequate context; using its name in component names is redundant and needlessly lengthens component tags. For example, a
<Customer>element could contain a
<Name>element, but should not contain a
<CustomerName>element, which would repeat the containing structure name (Customer). (Recommendation 4 is related but uses data modeling terminology. Using repetition consistently was also considered, but led to an obviously undesirable practice.)
- All schemas specified by IXRetail shall put the global names they declare in one namespace; this shall be the IXRetail namespace, which is http://www.nrf-arts.org/IXRetail/namespace/. Putting IXRetail names in a namespace avoids name collisions with schema specifications from other sources that our users may need. By avoiding multiple namespaces, IXRetail can better limit occurrences of unintended equivalent declarations or definitions. IXRetail can check that each global name within this namespace has a unique declaration or definition. This guideline does not restrict importing schema documents that use other namespaces. (By limiting itself to one namespace, IXRetail has committed itself to carefully reviewing each addition it makes to its namespace. It is anticipated that this namespace will grow as IXRetail standardizes additional message schemas. Other guidelines limit the use of global names and reduce the difficulty in following this guideline. Because only IXRetail can approve specifications using its namespace URI, each other specification approver must adopt its own namespace URI. The slash or solidus character ["/"] that terminates this URI has also been discussed. Many standard namespace names do not end with this character. The registrar that provided this URI was asked to provide an identifier that would not be used for a specific file because the identified resource would change over time; the registrar provided a URI for a directory. However, a namespace is a conceptual resource and its URI is used to name it and not to locate it. The namespace is neither a file nor a directory; whether its URI ends in a solidus is not significant.)
- Each XML instance document produced by IXRetail should specify a default namespace, which should be the IXRetail namespace. The use of a default namespace avoids the need to explicitly prefix names from that namespace. This shortens the tags that use names from the IXRetail namespace. Specifying a default namespace also provides the appropriate example to users of the XML schema documents specified by IXRetail. (It is important to note that this guideline applies to "XML instance documents" and not to "XML schema documents." It is intended that documents that specify particular messages, such as example interaction scenarios, be distinct from the documents that specify schemas for standardized message types. This distinction is made to clearly differentiate what is being standardized and what are examples of application of the standard. XML messages that only reference a schema and do not add new declarations are XML instance documents in this sense.)
- Each XML schema document produced by IXRetail should specify a default namespace and a target namespace, both of which should be the IXRetail namespace. This provides consistent references to names from the IXRetail namespace. Although this requires that names from W3C's XML Schema specification be explicitly prefixed, it only increases the length of the schema, not the length of instance documents. It also makes the handling of all names defined in XML Schema and related XML standards consistent with each other: W3C standardized names are always prefixed. (As with the preceding guideline, schema documents and instance documents are distinguished. For this distinction, an XML schema document is an XML document that adds new attribute or element declarations.)
- Where domain experts believe a type is likely to be reused, either a simpleType or a complexType should be defined globally in the namespace instead of being defined anonymously in the
Elementdeclaration. Because they are not usually used in tags, type names can be concatenated from sufficient roots and modifiers to identify the appropriate domain without necessarily causing long tags. This differs from element names, which are always used in tags. As a result, the alternative of global element names would lengthen tags. (Sometimes a type name is used in an instance document, such as when the type of a concrete element is specified with
xsi:type. In such cases, the length of type names does affect message length.)
- Schemas should use nested elements that use the
typeattribute or an inline type definition (simpleType or complexType) instead of the
refattribute that references a global element. Whenever possible, local element naming should be used so that names can be kept short. The global part of the IXRetail namespace should be reserved for names with well-defined meanings. These global names should be constructed with sufficient roots and modifiers to identify their domain of use. (Guideline 12 applies when reuse of declarations or definitions is suggested. Guideline 12 states a preference for global types over global elements. The outermost element of a message will appropriately have a global name, which will distinguish that message from all other messages. Elements contained within a message always have the context of the containing message.)
- Each version of a schema produced by IXRetail must have its own URI value for the
schemaLocationattribute that is different from the URI value of every other version of every other schema; the URI must be in a hierarchy agreed with ARTS-NRF (each
schemaLocationwill be the URI for a UTF-8 file subordinate to http://www.nrf-arts.org/IXRetail/schemaLocation/). The
versionattribute of the
<xs:schema>opening tag should be specified and should have a value that is the same string as the
schemaLocationURI value. The assignment of values for
versionshould be tied to schema approval, establishment, and release. These values should also include release date, following the pattern used by W3C. Even the initial versions of IXRetail schemas should use some version-control mechanism. It is desirable to use a version mechanism that parallels the schema-discovery mechanism standardized for validating parsers. An example of the W3C pattern for including release date in a name is: http://www.w3.org/TR/2001/REC-xmlschema-2-20010502. (XML requires that all conforming XML processors support UTF-8. UTF-8 can also be browsed or read by almost all text-processing tools, many of which would have problems with UTF-16. The development procedures assumed by this guideline may not be appropriate for all organizations, and some organizations may already have established conventions for version identification that do not permit the
schemaLocationhint provided by using
versionas suggested here.)
- Use names from ARTS XML Dictionary when appropriate, instead of inventing new names. The ARTS XML Dictionary is a list of names initially derived from entity and attribute names of the logical view of the ARTS Data Model. The context of the ARTS Data Model provides significant semantics to these names. The names must still be selected and used consistently with all the other guidelines. Names from IXRetail schemas will also be added to the ARTS XML Dictionary. (This guideline is intended to make use of XML technology to extend the database efforts that preceded it. IXRetail and ARTS staff have expended much effort on converting data dictionaries and related Data Model specifications. Although these conversions were a significant effort, they ensured that the XML specifications were closely related to information flows and processes already widely deployed in the retail industry. Without these conversions, much more requirements validation would have been required. Further, requirements gathering and validation is perhaps the slowest part of the standardization process since the industry's leaders are unwilling to tell their competitors about high priority requirements.)
- When choosing a name that is global within a namespace, use compound names that describe the specific meaning of the thing being declared or defined. The purpose of this guideline is to avoid inappropriate use of a general term with a specific meaning. If global names are simple, users will tend to think of them as having a general utility, even when the type was chosen to meet the requirement of only a limited domain, industry segment, or geographic region. For example, a
LineItemglobal concrete type should not be defined because the information components differ significantly between sales line items and tender line items. (This guideline does not apply to local names, which also have the context of use to describe their meaning and which do not prevent other uses of the same name with different context and different meaning.)
- Use consistent prefixes for names from namespaces that differ from the IXRetail namespace. Use no prefix for the IXRetail namespace. Use only these prefixes and definitions:
xml(defined in XML standard)
xmlns(defined in Namespaces in XML standard)
Keeping default namespaces and prefixes consistent helps make included schemas have the same meaning as inline textual inclusion, which ensures that people come to the same conclusions as to meaning that validating parsers do. (Guidelines 10 and 11 specify that the IXRetail namespace be specified as the default namespace; as a result, no prefix is needed for its global names. Additional prefixes would be added to this guideline as their use is approved.)
The goal of these guidelines is to assist development of standardized XML schemas. Fundamental features include choosing names for descriptive value and continuity with prior industry standards, using local naming to keep message sizes reasonable, and planning for change. We hope that you find some of our results applicable to your needs.
- Participate in the discussion forum.
- For more information on the Association for Retail Technology Standards (ARTS) of the National Retail Federation, go to the ARTS home page. There you'll also find links to IXRetail and ARTS' XML Dictionary.
- Learn how to use XML Schema in The Basics of using XML Schema to define elements.
- For a quick understanding of XML Schema, read the W3C's XML Schema Part 0: Primer. Follow up with the complete language description XML Schema Part 1: Structures and XML Schema Part 2: Datatypes.
- For a view of the extent of W3C's specifications related to XML, refer to W3C Extensible Markup Language.
- For tools such as validating parsers, refer to Apache Software Foundation.
- For another view of the human issues involved in designing successful XML schemas, see Sean McGrath's essay in the XML Journal.
- XML and WebSphere Studio Application Developer, Part 1: Developing XML Schema covers the essentials of using the XML Schema Editor, a visual tool for building XML Schema that conform to the XML Schema Recommendation Specification.
During the period that IXRetail developed its "XML Best Practices" document, Paul Golick was editor of that document, Secretary of IXRetail, and the representative to IXRetail and to the ARTS Data Model Committee for IBM Retail Store Solutions. You can contact Paul at email@example.com.
Richard Mader is Executive Director of ARTS and Administrator of IXRetail. You can contact Richard at Maderr@nrf.com.