Validating documents and schemas (DataStage)

XML Output provides an option to run two XML validation checks at run time:

  • Checks for well-formed XML.
  • Checks that elements and attributes conform to any XML schema that is referenced in the document.

If you decide to use the option, both validations are performed. Otherwise, no validation occurs.

To refer to a schema in your XML document, use the attribute schemaLocation within the root element tag.

For information about activating validation against a schema, see Stage properties.

Setting XML schema options

To enable validation, use the Strict option. To disable validation, use the Default option.

The XML parser compiles the validating schema to create a schema grammar. While validating the grammar, the parser can apply extra steps called full schema constraint checking, which may increase processing time and be memory-intensive.

If your job produces two or more XML documents that use the same schema, you can avoid recompiling the schema by caching the grammar.

For information about activating schema validation, see Stage properties.

Validation settings for XML schemas in DataStage

Under Properties, set properties for validating the XML data, logging errors, and grammar caching. The XML stage checks for well-formed XML and optionally checks that the data conforms to a referenced XML schema. To use default values for elements and attributes in the output data, you must specify a schema in your XML input data via the schemaLocation attribute within the root element tag.

The XML parser compiles the validating schema to create a schema grammar. While validating the grammar, the parser can apply extra steps called full schema constraint checking, which may increase processing time and be memory-intensive. To enable this checking, select the Strict option under XML validation level. To disable this type of validation, use the Default option.

If your job processes two or more XML documents that use the same schema, you can avoid recompiling the schema by caching the grammar. Select Enable grammar caching to cache the grammar.

Define mappings between the Xalan XSLT processor and DataStage to determine how parsing messages and faulty XML documents are processed. Assign fatal errors, non-fatal-errors, and warnings to one of the error levels in the following table.

DataStage error level Result
Reject Faulty document rows and messages can be written to a Reject link, if one exists.

You can also send the messages to the job log.

Fatal The job terminates, and the messages are written to the job log.
Warning A warning message is written to the job log.
Info An information message is written to the job log.
Trace Debug and monitoring information is written to the job log.

Mapping validation errors to DataStage errors

The XML parser reports three types of conditions: fatal, error, and warning.

  • Fatal errors are thrown when the XML is not well-formed.
  • Non-fatal errors are thrown when the XML violates a validity constraint. For example, the root element in the document is not found in the validating XML schema.
  • Warnings may be thrown when the schema has duplicate definitions.

For more information about these conditions, consult the XML and XML Schema specifications on the Worldwide Web Consortium web site.

By mapping parsing messages to DataStage® error levels, you decide how parsing messages and faulty XML documents are processed.

The following table describes how each DataStage error level is processed.

DataStage error level Result
Reject Faulty document rows and messages can be written to a Reject link, if one exists.

You can also send the messages to the job log.

For more information about processing messages and documents, see Using Reject links.

Fatal The server job terminates, and the messages are written to the job log.
Warning A warning message is written to the job log.
Info An information message is written to the job log.
Trace If the job runs with tracing set on, debug and monitoring information is written to the job log.

For more information about mapping errors and logging them, see Stage properties.