DataStage Resource Configuration

Before you configure your scanner, make sure you meet the prerequisites. Read our guide on DataStage integration requirements to double-check.

Source System Properties

This configuration can be setup by creating a new connection on Admin UI > Connections tab or editing an existing connection in Admin UI / Connections / Data Integration Tools / DataStage / specific connection. New connection can also be created via Manta Orchestration API.

One IBM Manta Data Lineage connection for DataStage corresponds to one DataStage server that will be analyzed.

Property name	Description	Example
datastage.extractor.server	Enter the name of the script input folder.	datastage
datastage.edition	DataStage edition; must be either "Standalone DataStage" or "DataStage for Cloud Pak for Data"	Standalone DataStage
datastage.input.encoding	Enter an encoding of the extracted DataStage projects. See Encodings for applicable values.	UTF-8
datastage.value.files	Enter the names of the value files that should be used in the parameter sets, ordered by priority.	DEV1,PROD,TEST
datastage.design.time.analysis.jobs.included	Enter a comma-separated list of jobs to analyze design time. Each element of the list is evaluated as a regular expression.	Job1,Job2,Job3
datastage.design.time.analysis.jobs.excluded	Enter a comma-separated list of jobs to exclude from design-time analysis. Each element of the list is evaluated as a regular expression. If left blank, all jobs matching the `datastage.design.time.analysis.jobs.included` list will be analyzed. If the `datastage.design.time.analysis.jobs.included` list is blank, all jobs not matching this list will be analyzed. If both are specified, only jobs matching the `datastage.design.time.analysis.jobs.included` list and not matching this list will be analyzed with design time. If both are left blank, all jobs will be analyzed.	Job1,Job2,Job3
datastage.analyze.job.runs	Indicates whether job runs should be analyzed. Only relevant for DataStage for Cloud Pak for Data.	true
datastage.analyze.job.runs.since	Only runs which happened after this date will be analyzed. If empty, all runs will be analyzed. Only relevant for DataStage for Cloud Pak for Data.	1970/12/31 23:59:59.999
datastage.analyze.jobs.separately	Indicates whether jobs should be analyzed separately even if there are runs associated with them. Only relevant for DataStage for Cloud Pak for Data.	true
datastage.analyze.flows.without.jobs	Indicates whether flows without jobs should be analyzed. Only relevant for DataStage for Cloud Pak for Data.	true
datastage.extraction.host.name (as of R42.7)	Enter the host name of the DataStage Cloud Pak for Data instance. Available only when Datastage for Cloupak for Data Automatic extraction option is selected.	`cpd-wkc.apps.rainbowdash.cp.fyre.ibm.com`
datastage.extraction.host.port (as of R42.7)	Enter the port of the DataStage Cloud Pak for Data instance. Available only when Datastage for Cloupak for Data Automatic extraction option is selected.	443
datastage.extraction.username (as of R42.7)	Enter a username for the connection to the DataStage instance. This option should be used in case you use On-premises Cloud Pak for Data solution. Available only when Datastage for Cloupak for Data Automatic extraction option is selected.
datastage.extraction.password (as of R42.7)	Enter a password for the connection to the DataStage instance. This option should be used in combination with the user name in case you use On-premises Cloud Pak for Data solution. Available only when Datastage for Cloupak for Data Automatic extraction option is selected.
datastage.extraction.tls.certificate (as of R42.7)	Enter the TLS/SSL certificate of the DataStage Cloud Pak for Data instance. Available only when Datastage for Cloupak for Data Automatic extraction option is selected.
datastage.extraction.api.key (as of R42.7)	Enter an API Key for the connection to the DataStage instance. This option should be used in case you use SaaS Cloud Pak for Data solution. Available only when Datastage for Cloupak for Data Automatic extraction option is selected.
datastage.extraction.filter.include (as of R42.7)	Enter a comma separated list of projects and flows to extract. Project and flow are to be provided in form projectName/flowName. Each part is evaluated as a regular expression. Flow part may be omitted along with the slash. Names are case-insensitive. Available only when Datastage for Cloupak for Data Automatic extraction option is selected.	TestProject/TestFlow,TestProject2/TestFlow2,TestProject3 Leave blank to extract all projects and flows.
datastage.extraction.filter.exclude (as of R42.7)	Enter a list of projects and flows to exclude from extraction. Syntax as in the previous row. If left blank, all database schemas matching the include list will be extracted.If the include list is blank, all flows not matching this list will be extracted. If both are specified, only projects and flows matching the include list and not matching this list will be extracted. If both are left blank, all projects and flows will be extracted.Available only when Datastage for Cloupak for Data Automatic extraction option is selected.

Common Scanner Properties

This configuration is common for all DataStage source systems and for all DataStage scenarios, and is configure in Admin UI / Configuration / CLI / DataStage / DataStage Common. It can be overridden on individual connection level.

Property name	Description	Example
datastage.input.dir	Enter a directory with exported DataStage XML (jobs)	${manta.dir.input}/datastage/${datastage.extractor.server} That is, if a script input folder is set to "datastage" (default), place all needed XML files in the directory `C:\mantaflow\cli\input\datastage\datastage`
filepath.lowercase	Whether paths to files should be lowercase (false for case sensitive file systems, true otherwise)	true false
datastage.dsparams.file	Path to DSParams file for correcting project-level environment parameters	${manta.dir.input}/datastage/${datastage.extractor.server}/DSParams
datastage.parameter.override.file	Path to the optional TXT file with definitions for overriding parameters	${manta.dir.input}/datastage/${datastage.extractor.server}/datastageParameterOverride.txt
datastage.parameter.override.helper.file	Path to helper TXT file that will be generated when unresolved parameters are found	${manta.dir.temp}/datastage/${datastage.extractor.server}/datastageParameterOverride.txt
datastage.manual.mapping.file	Path to the optional CSV file with manual mappings for unsupported stages	${manta.dir.input}/datastage/${datastage.extractor.server}/datastageManualMapping.csv
datastage.omd.files.directory	Path to the optional directory with operational metadata files (OMD)	${manta.dir.input}/datastage/${datastage.extractor.server}/omd_files
datastage.sql.files.directory	Path to the directory with SQL files used in database connectors	${manta.dir.input}/datastage/${datastage.extractor.server}/sql_files
datastage.odbc.connection.definition.file	Path to the optional ODBC connection definition file	${manta.dir.input}/datastage/${datastage.extractor.server}/connection_definition/odbcConnectionDefinition.ini
datastage.odbc.connection.definition.encoding	Encoding of the ODBC connection definition file. See Encodings for applicable values.	UTF-8
datastage.unsupported.stages.connect.all	Whether all input (output) columns without a corresponding output (input) column should be connected to all output (input) columns in unsupported stages	true false
datastage.enable.oracle.proxy.users	Whether Oracle proxy user authentication should be supported (true for changing Oracle usernames in "USERNAME[SCHEMA_OWNER]" format to "SCHEMA_OWNER", false otherwise). See https://docs.oracle.com/en/database/oracle/oracle-database/21/jjdbc/proxy-authentication.html#GUID-07E0AF7F-2C9A-42E9-8B99-F2716DC3B746 for more details.	true false
datastage.analysis.amazon.s3.addressing.model	Amazon S3 addressing model used during Amazon S3 stage analysis Possible values: PATH — Deprecated Amazon S3 addressing model that is still in use for back compatibility Format: https://s3.Region.amazonaws.com/bucket-name/key-name VIRTUAL_HOSTED — Modern Amazon S3 addressing model Format: https://bucket-name.s3.Region.amazonaws.com/key-name	VIRTUAL_HOSTED

Manual Mapping File

The file is optional and is expected in the path defined by the common property datastage.manual.mapping.file. The format of the CSV is:

"Full path to Stage";"Input Link name";"Input Column name";"Output Link name";"Output Column name";"Edge Type (DIRECT | FILTER)";"Description (optional)"
"manual_mapping_job/Generic_3";"DSLink2";"a";"DSLink5";"b";"DIRECT";""

The path to the stage is in the format Job/[Shared and Local containers optional]/Stage.

Parameter Override File

The file is optional and is expected in the path defined by the common property datastage.parameter.override.file. This file overrides previously-loaded parameter values, so you can set them manually or add new ones.

The `datastageParameterOverride.txt` File Format in Standalone DataStage

[ENVIRONMENT]
PARAM1_NAME = "param1_value"
PARAM2_NAME = "param2_value"
PARAM3_NAME = "param3_value"

[PARAMETER_SET/parameter_set_name]
param4_name  = "default_param4_value"
param5_name  = "default_param5_value"
$PARAM3_NAME = "$PROJDEF"

[VALUE_FILE/parameter_set_name/value_file1_name]
param4_name  = "some_param4_value"
param5_name  = "some_param5_value"
$PARAM3_NAME = "some_param3_value"

[VALUE_FILE/parameter_set_name/value_file2_name]
param4_name  = "other_param4_value"
param5_name  = "other_param5_value"
$PARAM3_NAME = "other_param3_value"

[JOB/job1_name]
param6_name = "param6_value"
param7_name = "param7_value"

[JOB/job2_name]
param8_name = "param8_value"

Four scopes of parameters can be added to the file.

Project/environment-level parameters — You can add any number of them under the [ENVIRONMENT] heading. Note that the values from the DSParam file сan be overridden.
Parameter set default parameters — You can add any number of them under the [PARAMETER_SET/parameter_set_name] heading by specifying the name of the parameter set. You can also refer to some default environment parameters as in the example above, referring to the value $PROJDEF.
Parameter set value file parameters — You can add any number of them under the [VALUE_FILE/parameter_set_name/value_file_name] heading by specifying the name of the parameter set and value file. Referring to the value $PROJDEF is also allowed here. Note: Don't forget to enter the names of the value files in the datastage.value.files property.
Job parameters — You can add any number of them under the [JOB/job_name] heading by specifying the name of the job. Referring to the value $PROJDEF is allowed.

Use correct file format. Spaces and tabs are allowed, but each scope definition and parameter entry must be on a separate line.

The `datastageParameterOverride.txt` File Format in DataStage for Cloud Pak for Data

parameter_1 = value
parameter_2 = value
$environment_variable_1 = value
$environment_variable_2 = value
parameter_set_name.parameter_1 = value
parameter_set_name.parameter_2 = value

The order of the parameters in the file is not important. Spaces around = are optional.

Parameter Override Helper File

If during the analysis any unresolved parameters were found, a "helper" file, datastageParameterOverride.txt, is generated in the location defined by the datastage.parameter.override.helper.file property. The hint that the helper file gives can also be found in the log file and in logs inside Admin UI Log Viewer relevant to the current DataStage dataflow analysis. After that, you need to perform the following steps.

Copy the helper file to the location defined by the datastage.parameter.override property. If such a file already exists in the destination folder, you can safely replace it.
- You can also copy the contents of the helper file from the log and create the file in the above location. If such a file already exists, you can simply replace its contents.
Now you need to open the file and fill in the values of all unresolved parameters. Such parameters are empty and may look like this: my_param = "". The parameter value must be enclosed in double quotes; for example, my_param = "my_value".
Save and close the file.
Run the analysis again.

ODBC Connection Definition Settings

The connection settings for IBM DataStage are exported automatically by Manta Data Lineage during lineage analysis — but to resolve the ODBC connector, Manta Data Lineage needs more information about the relevant source/destination systems. They can be configured manually as follows.

Create or open the file referenced by the datastage.odbc.connection.definition.file property (e.g., <MANTA_DIR_HOME>/input/datastage/${datastage.extractor.server}/connection_definition/odbcConnectionDefinition.ini).
Follow the instructions in Manually Define a Database Connection.