GitHubContribute in GitHub: Edit online

DataStage Resource Configuration

Before you configure your scanner, make sure you meet the prerequisites. Read our guide on DataStage integration requirements to double-check.

Source System Properties

This configuration can be setup by creating a new connection on Admin UI > Connections tab or editing an existing connection in Admin UI / Connections / Data Integration Tools / DataStage / specific connection. New connection can also be created via Manta Orchestration API.

One IBM Manta Data Lineage connection for DataStage corresponds to one DataStage server that will be analyzed.

Property name

Description

Example

datastage.extractor.server

Enter the name of the script input folder.

datastage

datastage.edition

DataStage edition; must be either "Standalone DataStage" or "DataStage for Cloud Pak for Data"

Standalone DataStage

datastage.input.encoding

Enter an encoding of the extracted DataStage projects. See Encodings for applicable values.

UTF-8

datastage.value.files

Enter the names of the value files that should be used in the parameter sets, ordered by priority.

DEV1,PROD,TEST

datastage.design.time.analysis.jobs.included

Enter a comma-separated list of jobs to analyze design time. Each element of the list is evaluated as a regular expression.

Job1,Job2,Job3

datastage.design.time.analysis.jobs.excluded

Enter a comma-separated list of jobs to exclude from design-time analysis. Each element of the list is evaluated as a regular expression.

  • If left blank, all jobs matching the datastage.design.time.analysis.jobs.included list will be analyzed.

  • If the datastage.design.time.analysis.jobs.included list is blank, all jobs not matching this list will be analyzed.

  • If both are specified, only jobs matching the datastage.design.time.analysis.jobs.included list and not matching this list will be analyzed with design time.

  • If both are left blank, all jobs will be analyzed.

Job1,Job2,Job3

datastage.analyze.job.runs (as of Manta Flow R41)

Indicates whether job runs should be analyzed. Only relevant for DataStage for Cloud Pak for Data.

true

datastage.analyze.job.runs.since (as of Manta Flow R41)

Only runs which happened after this date will be analyzed. If empty, all runs will be analyzed. Only relevant for DataStage for Cloud Pak for Data.

1970/12/31 23:59:59.999

datastage.analyze.jobs.separately (as of Manta Flow R41)

Indicates whether jobs should be analyzed separately even if there are runs associated with them. Only relevant for DataStage for Cloud Pak for Data.

true

datastage.analyze.flows.without.jobs (as of Manta Flow R41)

Indicates whether flows without jobs should be analyzed. Only relevant for DataStage for Cloud Pak for Data.

true

datastage.extraction.host.name (as of R42.7)

Enter the host name of the DataStage Cloud Pak for Data instance. Available only when Datastage for Cloupak for Data Automatic extraction option is selected.

cpd-wkc.apps.rainbowdash.cp.fyre.ibm.com

datastage.extraction.host.port (as of R42.7)

Enter the port of the DataStage Cloud Pak for Data instance. Available only when Datastage for Cloupak for Data Automatic extraction option is selected.

1234

datastage.extraction.username (as of R42.7)

Enter a username for the connection to the DataStage instance. This option should be used in case you use On-premises Cloud Pak for Data solution. Available only when Datastage for Cloupak for Data Automatic extraction option is selected.

datastage.extraction.password (as of R42.7)

Enter a password for the connection to the DataStage instance. This option should be used in combination with the user name in case you use On-premises Cloud Pak for Data solution. Available only when Datastage for Cloupak for Data Automatic extraction option is selected.

datastage.extraction.tls.certificate (as of R42.7)

Enter the TLS/SSL certificate of the DataStage Cloud Pak for Data instance. Available only when Datastage for Cloupak for Data Automatic extraction option is selected.

datastage.extraction.api.key (as of R42.7)

Enter an API Key for the connection to the DataStage instance. This option should be used in case you use SaaS Cloud Pak for Data solution. Available only when Datastage for Cloupak for Data Automatic extraction option is selected.

datastage.extraction.filter.include (as of R42.7)

Enter a comma separated list of projects and flows to extract. Project and flow are to be provided in form projectName/flowName. Each part is evaluated as a regular expression. Flow part may be omitted along with the slash. Names are case-insensitive. Available only when Datastage for Cloupak for Data Automatic extraction option is selected.

TestProject/TestFlow,TestProject2/TestFlow2,TestProject3 Leave blank to extract all projects and flows.

datastage.extraction.filter.exclude (as of R42.7)

Enter a list of projects and flows to exclude from extraction. Syntax as in the previous row. If left blank, all database schemas matching the include list will be extracted.If the include list is blank, all flows not matching this list will be extracted. If both are specified, only projects and flows matching the include list and not matching this list will be extracted. If both are left blank, all projects and flows will be extracted.Available only when Datastage for Cloupak for Data Automatic extraction option is selected.

Common Scanner Properties

This configuration is common for all DataStage source systems and for all DataStage scenarios, and is configure in Admin UI / Configuration / CLI / DataStage / DataStage Common. It can be overridden on individual connection level.

Property name

Description

Example

datastage.input.dir

Enter a directory with exported DataStage XML (jobs)

${manta.dir.input}/datastage/${datastage.extractor.server}

That is, if a script input folder is set to "datastage" (default), place all needed XML files in the directory C:\mantaflow\cli\input\datastage\datastage

filepath.lowercase

Whether paths to files should be lowercase (false for case sensitive file systems, true otherwise)

true
false

datastage.dsparams.file

Path to DSParams file for correcting project-level environment parameters

${manta.dir.input}/datastage/${datastage.extractor.server}/DSParams

datastage.parameter.override.file

Path to the optional TXT file with definitions for overriding parameters

${manta.dir.input}/datastage/${datastage.extractor.server}/datastageParameterOverride.txt

datastage.parameter.override.helper.file

Path to helper TXT file that will be generated when unresolved parameters are found

${manta.dir.temp}/datastage/${datastage.extractor.server}/datastageParameterOverride.txt

datastage.manual.mapping.file

Path to the optional CSV file with manual mappings for unsupported stages

${manta.dir.input}/datastage/${datastage.extractor.server}/datastageManualMapping.csv

datastage.omd.files.directory

Path to the optional directory with operational metadata files (OMD)

${manta.dir.input}/datastage/${datastage.extractor.server}/omd_files

datastage.sql.files.directory

Path to the directory with SQL files used in database connectors

${manta.dir.input}/datastage/${datastage.extractor.server}/sql_files

datastage.odbc.connection.definition.file

Path to the optional ODBC connection definition file

${manta.dir.input}/datastage/${datastage.extractor.server}/connection_definition/odbcConnectionDefinition.ini

datastage.odbc.connection.definition.encoding

Encoding of the ODBC connection definition file. See Encodings for applicable values.

UTF-8

datastage.unsupported.stages.connect.all

Whether all input (output) columns without a corresponding output (input) column should be connected to all output (input) columns in unsupported stages

true
false

datastage.enable.oracle.proxy.users

Whether Oracle proxy user authentication should be supported (true for changing Oracle usernames in "USERNAME[SCHEMA_OWNER]" format to "SCHEMA_OWNER", false otherwise). See https://docs.oracle.com/en/database/oracle/oracle-database/21/jjdbc/proxy-authentication.html#GUID-07E0AF7F-2C9A-42E9-8B99-F2716DC3B746 for more details.

true
false

datastage.analysis.amazon.s3.addressing.model

Amazon S3 addressing model used during Amazon S3 stage analysis

Possible values:

  • PATH — Deprecated Amazon S3 addressing model that is still in use for back compatibility

Format: https://s3.Region.amazonaws.com/bucket-name/key-name

  • VIRTUAL_HOSTED — Modern Amazon S3 addressing model

Format: https://bucket-name.s3.Region.amazonaws.com/key-name

VIRTUAL_HOSTED

Manual Mapping File

The file is optional and is expected in the path defined by the common property datastage.manual.mapping.file. The format of the CSV is:

"Full path to Stage";"Input Link name";"Input Column name";"Output Link name";"Output Column name";"Edge Type (DIRECT | FILTER)";"Description (optional)"
"manual_mapping_job/Generic_3";"DSLink2";"a";"DSLink5";"b";"DIRECT";""

The path to the stage is in the format Job/[Shared and Local containers optional]/Stage.

Parameter Override File

The file is optional and is expected in the path defined by the common property datastage.parameter.override.file. This file overrides previously-loaded parameter values, so you can set them manually or add new ones.

The datastageParameterOverride.txt File Format in Standalone DataStage

[ENVIRONMENT]
PARAM1_NAME = "param1_value"
PARAM2_NAME = "param2_value"
PARAM3_NAME = "param3_value"

[PARAMETER_SET/parameter_set_name]
param4_name  = "default_param4_value"
param5_name  = "default_param5_value"
$PARAM3_NAME = "$PROJDEF"

[VALUE_FILE/parameter_set_name/value_file1_name]
param4_name  = "some_param4_value"
param5_name  = "some_param5_value"
$PARAM3_NAME = "some_param3_value"

[VALUE_FILE/parameter_set_name/value_file2_name]
param4_name  = "other_param4_value"
param5_name  = "other_param5_value"
$PARAM3_NAME = "other_param3_value"

[JOB/job1_name]
param6_name = "param6_value"
param7_name = "param7_value"

[JOB/job2_name]
param8_name = "param8_value"

Four scopes of parameters can be added to the file.

  1. Project/environment-level parameters — You can add any number of them under the [ENVIRONMENT] heading. Note that the values from the DSParam file сan be overridden.

  2. Parameter set default parameters — You can add any number of them under the [PARAMETER_SET/parameter_set_name] heading by specifying the name of the parameter set. You can also refer to some default environment parameters as in the example above, referring to the value $PROJDEF.

  3. Parameter set value file parameters — You can add any number of them under the [VALUE_FILE/parameter_set_name/value_file_name] heading by specifying the name of the parameter set and value file. Referring to the value $PROJDEF is also allowed here. Note: Don't forget to enter the names of the value files in the datastage.value.files property.

  4. Job parameters — You can add any number of them under the [JOB/job_name] heading by specifying the name of the job. Referring to the value $PROJDEF is allowed.

Use correct file format. Spaces and tabs are allowed, but each scope definition and parameter entry must be on a separate line.

The datastageParameterOverride.txt File Format in DataStage for Cloud Pak for Data

parameter_1 = value
parameter_2 = value
$environment_variable_1 = value
$environment_variable_2 = value
parameter_set_name.parameter_1 = value
parameter_set_name.parameter_2 = value

The order of the parameters in the file is not important. Spaces around = are optional.

Parameter Override Helper File

If during the analysis any unresolved parameters were found, a "helper" file, datastageParameterOverride.txt, is generated in the location defined by the datastage.parameter.override.helper.file property. The hint that the helper file gives can also be found in the log file and in logs inside Admin UI Log Viewer relevant to the current DataStage dataflow analysis. After that, you need to perform the following steps.

ODBC Connection Definition Settings

The connection settings for IBM DataStage are exported automatically by Manta Data Lineage during lineage analysis — but to resolve the ODBC connector, Manta Data Lineage needs more information about the relevant source/destination systems. They can be configured manually as follows.