Note: This tip assumes you have a basic knowledge of schema documents; there are a number of links to schema documentation and a tutorial in Resources.
Although there are a number of parsers and tools that use schemas to validate or analyze XML documents, tools that allow querying and advanced manipulation of schema documents themselves are still being built. The Schema Infoset Model (AKA the IBM Java Library for Schema Components, or just "the library") provides a rich API library that models schemas -- both their concrete representations (perhaps in a schema.xsd file) and the abstract concepts in a schema as defined by the specification. As anyone who has read the schema specs knows, they're quite detailed, and this model strives to expose all the details within any schema. This will then allow you to efficiently manage your schema collection, and empower higher level schema tools -- perhaps schema-aware parsers and transformers.
For an interface listing of the library showing all the schema objects modeled, please see Schema Infoset Model UML diagrams. The library also includes the UML diagrams used in building the library interfaces themselves; these diagrams show the relationships between the library objects, which very closely mimic the concepts in the schema specifications.
Example: Analyzing your schemas
In this example, you'll want to check your schema for possibly failing to specify restrictions on integer-derived types. This could be useful for ensuring that all order quantities in purchase orders have been bounded. Here, the schemas must be very specific, so you want to require that all simple types that derive from integers include both min/maxInclusive or min/maxExclusive facets. However, if the min/maxInclusive or min/maxExclusive facets are inherited from a type which this type derives from, that is still sufficient.
While you can use XSLT or XPath to query a schema's concrete representation in an .xsd file or inside some other .xml content, it is much more difficult to discover the type derivations and interrelationships that schema components actually have. Since the Schema Infoset Model library models both the concrete representation and the abstract concept of the schema, it can easily be used to collect details about its components, even when the schema may have deep type hierarchies or be defined in multiple schema files.
In this simple schema, you will find some types that meet the criteria of having max/min facets, and some that do not. (You can find the full schema in FindTypesMissingFacets.xsd included in the zip file.)
Listing 1. Sample schema
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.research.ibm.com/XML/NS/xsd"
xmlns="http://www.research.ibm.com/XML/NS/xsd">
<!-- SimpleType missing both max/min facets -->
<xsd:simpleType name="integer-noFacets">
<xsd:restriction base="xsd:integer"/>
</xsd:simpleType>
<!-- Derived type has inherited min facet but missing max facet -->
<xsd:simpleType name="positiveInteger-inheritedMinFacet">
<xsd:restriction base="xsd:positiveInteger"/>
</xsd:simpleType>
<!-- Derived type with both effective max/min facets -->
<xsd:simpleType name="positiveInteger-bothFacets">
<xsd:restriction base="positiveInteger-inheritedMinFacet">
<xsd:maxExclusive value="100"/>
</xsd:restriction>
</xsd:simpleType>
<!-- etc... -->
</xsd:schema>
|
Loading schemas into the library
The library can read and write schema objects from a variety of sources. I'll show it using the IBM WebSphere EMF ResourceSet framework to easily load sets of schemas; you can also build and emit schemas directly from or to a DOM object that you manage yourself. The library provides a custom XSDResourceSet implementation that can intelligently and automatically load sets of schemas related by includes, imports, and redefines. The abstract relationship between related schemas is also modeled in the library.
Listing 2. Loading a schema
// String variable schemaURL is "FindTypesMissingFacets.xsd" or the URL to your schema
// Create a resource set and load the main schema file into it.
ResourceSet resourceSet = new ResourceSetImpl();
XSDResourceImpl xsdSchemaResource = (XSDResourceImpl)resourceSet.load(schemaURL);
// getResources() returns an iterator over all the resources, therefore, the main resource
// and those that have been included, imported, or redefined.
for (Iterator resources = resourceSet.getResources().iterator();
resources.hasNext(); /* no-op */)
{
// Return the first schema object found, which is the main schema
// loaded from the provided schemaURL
Resource resource = (Resource)resources.next();
if (resource instanceof XSDResourceImpl)
{
XSDResourceImpl xsdResource = (XSDResourceImpl)resource;
// This returns a com.ibm.etools.xsd.XSDSchema object
return xsdResource.getSchema();
}
}
|
Now that you have an XSDSchema object, you need to query it to find any types that are missing max/min facets. First, you'll use some convenient library methods to quickly find all of its simpleTypeDefinitions that derive from the built-in integer type. Since the library provides a complete model of the abstract meaning of a schema, this turns out to be very straightforward. You can query the XSDSchema for its getTypeDefinitions() listing, and then filter for XSDSimpleTypeDefinitions that actually inherit from the base integer type.
Listing 3. Getting a list of specific types
// A handy convenience method quickly gets all
// typeDefinitions within the schema
List allTypes = schema.getTypeDefinitions();
ArrayList allIntegerTypes = new ArrayList();
for (Iterator iter = allTypes.iterator();
iter.hasNext(); /* no-op */)
{
XSDTypeDefinition typedef = (XSDTypeDefinition)iter.next();
// Filter out for only simpleTypes...
if ((typedef instanceof XSDSimpleTypeDefinition)
// ... and filter for built-in integer types
// Use a worker method in the very handy sample
// program com.ibm.etools.xsd.util. XSDSchemaQueryTools
&& XSDSchemaQueryTools.isTypeDerivedFrom(typedef,
schema.getSchemaForSchemaNamespace(), "integer"))
{
// The filter found one; save it and continue.
allIntegerTypes.add(typedef);
}
}
|
Every component defined in the W3C schema specifications is modeled in detail in the library. Now that you have a list of all XSDSimpleTypeDefinitions that derive from an integer, you can query this list for ones that are missing either their max or min facets, and produce a report. Note that the library can conveniently group the effective max/minExclusive or max/minInclusive facets together for quick searching; it also provides detailed access to each type, including the actual lexical values if needed.
Listing 4. Querying XSDSimpleType components
for (Iterator iter = allIntegerTypes.iterator();
iter.hasNext(); /* no-op */)
{
XSDSimpleTypeDefinition simpleType = (XSDSimpleTypeDefinition)iter.next();
// First, exclude any UNION or LIST types, since
// the schema spec says they can't have min/max facets:
// Part 2: Datatypes in:
// '4.1.5 Constraints on Simple Type Definition Schema Components'
if ((XSDVariety.LIST == simpleType.getValueVariety())
|| (XSDVariety.UNION == simpleType.getValueVariety()))
{
// Unions and lists cannot have min/max facets at all,
// so there's no need to report them
continue;
}
// Get the effective max/min facets for each type -
// this includes ones declared in this type or
// ones that are inherited, and so forth
XSDMaxFacet maxFacet = simpleType.getEffectiveMaxFacet();
XSDMinFacet minFacet = simpleType.getEffectiveMinFacet();
// If you don't have the proper ones, report the error.
if ((null == maxFacet) || (null == minFacet))
{
if (simpleType.isSetName())
{
// A component's URI in the library is effectively
// its <target namespace>#<name>
System.out.println("Schema named component: " + simpleType.getURI() );
}
else
{
// It's an anonymous type, so ask the library
// to construct a default 'alias' for it
System.out.println("Schema anonymous component: " + simpleType.getAliasURI() );
}
System.out.print(" is missing these required facets: ");
if (null == maxFacet)
{
System.out.print(" XSDMaxFacet (either inclusive or exclusive) ");
}
if (null == minFacet)
{
System.out.print(" XSDMinFacet (either inclusive or exclusive) ");
}
// You could also report on the facets this type does have like:
// if (minFacet.isExclusive) {
// System.out.println("minFacet.getValue=" + minFacet.getValue());
// }
}
}
|
Your report: Types missing max/min facets
With just a little bit of code, you've discovered some fairly detailed information about the schema. If you download the sample code and run it against the provided schema file, you should see a listing like this:
Listing 5. The output report
Schema missing max/min facet report on: FindTypesMissingFacets.xsd
Schema named component: http://www.research.ibm.com/XML/NS/xsd#integer-minFacet
is missing these required facets: XSDMaxFacet (either inclusive or exclusive)
Schema named component: http://www.research.ibm.com/XML/NS/xsd#integer-noFacets
is missing these required facets: XSDMaxFacet (either inclusive or exclusive)
XSDMinFacet (either inclusive or exclusive)
Schema named component: http://www.research.ibm.com/XML/NS/xsd#positiveInteger-inheritedMinFacet
is missing these required facets: XSDMaxFacet (either inclusive or exclusive)
|
Although this is a contrived example, it does show how the library's detailed representation of a schema makes it easy to find exactly the parts of a schema you need. The library provides setter methods for the properties of schema components, so it is easy to update your sample to automatically fix any found types by adding any missing facets. And since the library models the concrete representation of the schema as well, you can write your updated schema back out to an .xsd file.
A sample program, XSDFindTypesMissingFacets.java, shows the example in this article. It uses a schema document FindTypesMissingFacets.xsd which has a number of types with and without max/min facets.
You can download the sample program and the following sample .java files in a zip file.
Copies of several other sample .java files normally shipped with the Schema Infoset Model are also attached. These include:
-
XSDSchemaQueryTools.javashowcases a number of other ways to perform advanced queries on schema objects. -
XSDSchemaBuildingTools.javawith convenience methods for building schemas programmatically. -
XSDPrototypicalSchema.javauses the library to build the ever-popular schema primer PurchaseOrder sample.
| Description | Name | Size | Download method |
|---|---|---|---|
| Code sample | x-schemimj.zip | 35KB | HTTP |
Information about download methods
- Participate in the discussion forum.
- See a full schema library class listing.
- Read some of IBM's thoughts about what makes a good schema API.
- Start with an Introduction to XML Schemas by Eric van der Vlist.
- See W3C's schema specifications (primer, datatypes, and structures).
- Download IBM product evaluation versions or explore the online trials in the IBM SOA Sandbox and get your hands on application development tools and middleware products from DB2, Lotus®, Rational®, Tivoli®, and WebSphere.
- Download Apache's Xerces-J parser, which includes basic schema validation tools.
- Want us to send you useful XML tips like this every week? Sign up for the developerWorks
XML Tips newsletter.
- Find plenty more XML resources on the developerWorks XML zone.
Shane Curcuru has been a developer and quality engineer at Lotus and IBM for 12 years and is a member of the Apache Software Foundation. He has worked on such diverse projects as Lotus 1-2-3, Lotus eSuite, Apache's Xalan-J XSLT processor, and a variety of XML Schema tools. Questions about this article or about automated testing can be sent to him at shane_curcuru@us.ibm.com.
Comments (Undergoing maintenance)





