Validating documents and schemas (DataStage)
XML Output provides an option to run two XML validation checks at run time:
- Checks for well-formed XML.
- Checks that elements and attributes conform to any XML schema that is referenced in the document.
If you decide to use the option, both validations are performed. Otherwise, no validation occurs.
To refer to a schema in your XML
document, use the attribute schemaLocation
within the root element tag.
For information about activating validation against a schema, see Stage properties.
Setting XML schema options
To enable validation, use the Strict
option. To disable
validation, use the Default
option.
The XML parser compiles the validating schema to create a schema grammar. While validating the grammar, the parser can apply extra steps called full schema constraint checking, which may increase processing time and be memory-intensive.
If your job produces two or more XML documents that use the same schema, you can avoid recompiling the schema by caching the grammar.
For information about activating schema validation, see Stage properties.
Validation settings for XML schemas in DataStage
Under Properties, set properties for validating the XML data, logging
errors, and grammar caching. The XML stage checks for well-formed XML and optionally checks that the
data conforms to a referenced XML schema. To use default values for elements and attributes in the
output data, you must specify a schema in your XML input data via the
schemaLocation
attribute within the root element tag.
The XML parser compiles the validating schema to create a schema grammar. While validating the grammar, the parser can apply extra steps called full schema constraint checking, which may increase processing time and be memory-intensive. To enable this checking, select the Strict option under XML validation level. To disable this type of validation, use the Default option.
If your job processes two or more XML documents that use the same schema, you can avoid recompiling the schema by caching the grammar. Select Enable grammar caching to cache the grammar.
Define mappings between the Xalan XSLT processor and DataStage to determine how parsing messages and faulty XML documents are processed. Assign fatal errors, non-fatal-errors, and warnings to one of the error levels in the following table.
DataStage error level | Result |
---|---|
Reject | Faulty document rows and messages can be written to a Reject link, if one exists. You can also send the messages to the job log. |
Fatal | The job terminates, and the messages are written to the job log. |
Warning | A warning message is written to the job log. |
Info | An information message is written to the job log. |
Trace | Debug and monitoring information is written to the job log. |
Mapping validation errors to DataStage errors
The XML parser reports three types of conditions: fatal, error, and warning.
- Fatal errors are thrown when the XML is not well-formed.
- Non-fatal errors are thrown when the XML violates a validity constraint. For example, the root element in the document is not found in the validating XML schema.
- Warnings may be thrown when the schema has duplicate definitions.
For more information about these conditions, consult the XML and XML Schema specifications on the Worldwide Web Consortium web site.
By mapping parsing messages to DataStage® error levels, you decide how parsing messages and faulty XML documents are processed.
The following table describes how each DataStage error level is processed.
DataStage error level | Result |
---|---|
Reject | Faulty document rows and messages can be written to a Reject link, if one
exists. You can also send the messages to the job log. For more information about processing messages and documents, see Using Reject links. |
Fatal | The server job terminates, and the messages are written to the job log. |
Warning | A warning message is written to the job log. |
Info | An information message is written to the job log. |
Trace | If the job runs with tracing set on, debug and monitoring information is written to the job log. |
For more information about mapping errors and logging them, see Stage properties.
Using Reject links
XML Output supports one Reject link, which can store rejection messages and rejected rows.
Writing rejection messages to the link
To write rejection messages to a Reject link:
- Add a column on the Reject link.
- Using the General page of the Output Link properties, identify the column as the target for rejection messages.
Writing rejected rows to the link
To write rejected rows to a Reject link:
Add a column on the Reject link that has the same name as the column on the target column on the output link.
Column names for this operation are case-sensitive.
For information about setting up a Reject link, see Output link properties.
Writing rejection messages to the job log
To write rejection messages to the job log:
On the Validation Settings page of the Stage properties, select the Log Reject Errors box.
For more information about the General page, see Stage properties.