How to read XML files directly from disk in a DataStage parallel job

Question & Answer

Question

XML documents are meant to be processed in their entirety by the XML Input stage within a DataStage job. The traditional method of using a sequential file stage to access a file is designed to parse the file into records and columns. This is problematic for XML documents which are hierarchical data structures and are not organized in tabular form. Attempting to configure the sequential file stage to read the file without parsing or altering the content is error prone and will lead to unpredictable results. Given these issues, what is a good solution for reading XML files from disk in a parallel job?

[{"Product":{"code":"SSVSEF","label":"IBM InfoSphere DataStage"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Component":"--","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"8.0.1;8.0;7.5;8.1","Edition":"","Line of Business":{"code":"LOB76","label":"Data Platform"}}]

Log InLog in to view more of this document

This document has the abstract of a technical article that is available to authorized users once you have logged on. Please use Log in button above to access the full document. After log in, if you do not have the right authorization for this document, there will be instructions on what to do next.

Was this topic helpful?

Document Information

More support for:
IBM InfoSphere DataStage

Software version:
8.0.1, 8.0, 7.5, 8.1

Operating system(s):
AIX, HP-UX, Linux, Solaris, Windows

Document number:
622427

Modified date:
25 April 2025

UID

swg21377877

IBM Support

Tips