DataStage Manual Inputs
For successful analysis, it is necessary to provide exports of DataStage jobs, and it may also be necessary to provide IBM Automatic Data Lineage with additional parameter and indirect files. Put these files in the folders defined in the configuration properties according to the Source System Requirements section in DataStage Resource Configuration.
Input Folder Structure for InfoSphere DataStage
To analyze InfoSphere DataStage job files, create a directory structure as follows.
-
input/datastage/${datastage.extractor.server}
-
*.xml
— parallel DataStage jobs you want to analyze and the parameter sets you use in those jobs as one XML file, exported from Designer Client in the Repository Export window (see details below) -
DSParams
— file with parameters; see InfoSphere DataStage Scanner Guide -
datastageParameterOverride.txt
— file with parameter set overrides InfoSphere DataStage Scanner Guide / Parameter-Sets -
sql_files
— folder with all the SQL files (This is optional; only needed if your Datastage jobs reference external SQL files) -
datastageManualMapping.csv
— file with component lineage override InfoSphere DataStage Scanner Guide) -
omd_files
— optional folder for operational metadata files as per InfoSphere DataStage Scanner Guide files*.xml
individual files with operational metadata
-
connection_definition/odbcConnectionDefinition.ini
— file with connection definitions as per InfoSphere DataStage Scanner Guide
-
Exporting jobs from InfoSphere DataStage
To perform the required exports for Classic DataStage
-
Log into the Windows DataStage Designer Client.
-
Choose (from the toolbar) Export…DataStage Components.
-
In the dialog that appears, use the “add” option (blue arrow) to select the Jobs, Parameter Sets, Containers and other assets you need. It is suggested that you check the dependent items (green arrow) and be certain to select XML as the type of Export (red arrow, required!). Decide on the path and filename for your export (we recommend you put a .xml suffix on the filename, as it is not added by default)
-
Click the Export button.
Input Folder Structure for NextGen DataStage
To analyze NextGen DataStage job files, create a directory structure as follows.
-
input/datastage/${datastage.extractor.server}
-
*.zip
— the appropriate*.zip
files with the "Project UI export" of NextGen DataStage (see details below) -
datastageParameterOverride.txt
— file with parameter set overrides https://manta-io.atlassian.net/wiki/spaces/MTKB/pages/3703472129/InfoSphere+DataStage+Scanner+Guide#Parameter-Sets -
sql_files
— folder with all the SQL files (This is optional; only needed if your Datastage jobs reference external SQL files) -
datastageManualMapping.csv
— file with component lineage override https://manta-io.atlassian.net/wiki/spaces/MTKB/pages/3703472129/InfoSphere+DataStage+Scanner+Guide#Step-6%3A-Overriding-Component-Data-Lineage-(Optional)) -
The folder above can be provided to Automatic Data Lineage for execution as per Manta Flow Usage: Preparing Scanner Inputs.
-
Exporting jobs from DataStage NextGen
-
Go to project and select Export project in the top right corner of the screen:
-
Select items you want to export or export the whole project. Whole project export is recommend as it ensure all dependencies be included.
DataStage Next Gen also provides another export method which is not suitable for Automatic Data Lineage. Do not use the following:
How to Detect If the Wrong Input Format Has Been Used for NextGen DataStage
There are multiple ways to export DataStage objects from NextGen DataStage, and Automatic Data Lineage only supports one of them as scanner input. If your scanned export does not produce lineage, check the following description of the input file to ensure that the format is compatible with Automatic Data Lineage.
<datastagenextgen>.zip
-
assets
— mandatory folder-
.METADATA
— mandatory folder-
data_intg_flow.*.json
— mandatory files containing information about flows -
connection.*.json
— optional files containing information about connections -
parameter_set.*.json
— optional files containing information about parameter sets -
job.*.json
— optional files containing information about jobs -
job_run.*.json
— optional files containing information about particular executions of the job
-
-
data_intg_flow
— mandatory folder- At least one file not ending in
px_executables
. Open any such file in a text editor and search for the string"schemas":[{
. There should be at least one occurrence of this string.
- At least one file not ending in
-
-
assettypes
— mandatory folder -
project.json
— mandatory file (There may be multiple instances of this file as a result of ZIP decompression, which is correct.)
If the exported objects meet these requirements and scanning them still does not result in lineage, reach out or open a ticket.