DataStage Resource Configuration
Before you configure your scanner, make sure you meet the prerequisites. Read our guide on DataStage integration requirements to double-check.
Source System Properties
This configuration can be setup by creating a new connection on Admin UI > Connections tab or editing an existing connection in Admin UI / Connections / Data Integration Tools / DataStage / specific connection. New connection can also be created via Manta Orchestration API.
One IBM Manta Data Lineage connection for DataStage corresponds to one DataStage server that will be analyzed.
Property name |
Description |
Example |
---|---|---|
datastage.extractor.server |
Enter the name of the script input folder. |
datastage |
datastage.edition |
DataStage edition; must be either "Standalone DataStage" or "DataStage for Cloud Pak for Data" |
Standalone DataStage |
datastage.input.encoding |
Enter an encoding of the extracted DataStage projects. See Encodings for applicable values. |
UTF-8 |
datastage.value.files |
Enter the names of the value files that should be used in the parameter sets, ordered by priority. |
DEV1,PROD,TEST |
datastage.design.time.analysis.jobs.included |
Enter a comma-separated list of jobs to analyze design time. Each element of the list is evaluated as a regular expression. |
Job1,Job2,Job3 |
datastage.design.time.analysis.jobs.excluded |
Enter a comma-separated list of jobs to exclude from design-time analysis. Each element of the list is evaluated as a regular expression.
|
Job1,Job2,Job3 |
datastage.analyze.job.runs (as of Manta Flow R41) |
Indicates whether job runs should be analyzed. Only relevant for DataStage for Cloud Pak for Data. |
true |
datastage.analyze.job.runs.since (as of Manta Flow R41) |
Only runs which happened after this date will be analyzed. If empty, all runs will be analyzed. Only relevant for DataStage for Cloud Pak for Data. |
1970/12/31 23:59:59.999 |
datastage.analyze.jobs.separately (as of Manta Flow R41) |
Indicates whether jobs should be analyzed separately even if there are runs associated with them. Only relevant for DataStage for Cloud Pak for Data. |
true |
datastage.analyze.flows.without.jobs (as of Manta Flow R41) |
Indicates whether flows without jobs should be analyzed. Only relevant for DataStage for Cloud Pak for Data. |
true |
datastage.extraction.host.name (as of R42.7) |
Enter the host name of the DataStage Cloud Pak for Data instance. Available only when Datastage for Cloupak for Data Automatic extraction option is selected. |
|
datastage.extraction.host.port (as of R42.7) |
Enter the port of the DataStage Cloud Pak for Data instance. Available only when Datastage for Cloupak for Data Automatic extraction option is selected. |
1234 |
datastage.extraction.username (as of R42.7) |
Enter a username for the connection to the DataStage instance. This option should be used in case you use On-premises Cloud Pak for Data solution. Available only when Datastage for Cloupak for Data Automatic extraction option is selected. |
|
datastage.extraction.password (as of R42.7) |
Enter a password for the connection to the DataStage instance. This option should be used in combination with the user name in case you use On-premises Cloud Pak for Data solution. Available only when Datastage for Cloupak for Data Automatic extraction option is selected. |
|
datastage.extraction.tls.certificate (as of R42.7) |
Enter the TLS/SSL certificate of the DataStage Cloud Pak for Data instance. Available only when Datastage for Cloupak for Data Automatic extraction option is selected. |
|
datastage.extraction.api.key (as of R42.7) |
Enter an API Key for the connection to the DataStage instance. This option should be used in case you use SaaS Cloud Pak for Data solution. Available only when Datastage for Cloupak for Data Automatic extraction option is selected. |
|
datastage.extraction.filter.include (as of R42.7) |
Enter a comma separated list of projects and flows to extract. Project and flow are to be provided in form projectName/flowName. Each part is evaluated as a regular expression. Flow part may be omitted along with the slash. Names are case-insensitive. Available only when Datastage for Cloupak for Data Automatic extraction option is selected. |
TestProject/TestFlow,TestProject2/TestFlow2,TestProject3 Leave blank to extract all projects and flows. |
datastage.extraction.filter.exclude (as of R42.7) |
Enter a list of projects and flows to exclude from extraction. Syntax as in the previous row. If left blank, all database schemas matching the include list will be extracted.If the include list is blank, all flows not matching this list will be extracted. If both are specified, only projects and flows matching the include list and not matching this list will be extracted. If both are left blank, all projects and flows will be extracted.Available only when Datastage for Cloupak for Data Automatic extraction option is selected. |
Common Scanner Properties
This configuration is common for all DataStage source systems and for all DataStage scenarios, and is configure in Admin UI / Configuration / CLI / DataStage / DataStage Common. It can be overridden on individual connection level.
Property name |
Description |
Example |
---|---|---|
datastage.input.dir |
Enter a directory with exported DataStage XML (jobs) |
${manta.dir.input}/datastage/${datastage.extractor.server} That is, if a script input folder is set to "datastage" (default), place all needed XML files in the directory
|
filepath.lowercase |
Whether paths to files should be lowercase (false for case sensitive file systems, true otherwise) |
true |
datastage.dsparams.file |
Path to DSParams file for correcting project-level environment parameters |
${manta.dir.input}/datastage/${datastage.extractor.server}/DSParams |
datastage.parameter.override.file |
Path to the optional TXT file with definitions for overriding parameters |
${manta.dir.input}/datastage/${datastage.extractor.server}/datastageParameterOverride.txt |
datastage.parameter.override.helper.file |
Path to helper TXT file that will be generated when unresolved parameters are found |
${manta.dir.temp}/datastage/${datastage.extractor.server}/datastageParameterOverride.txt |
datastage.manual.mapping.file |
Path to the optional CSV file with manual mappings for unsupported stages |
${manta.dir.input}/datastage/${datastage.extractor.server}/datastageManualMapping.csv |
datastage.omd.files.directory |
Path to the optional directory with operational metadata files (OMD) |
${manta.dir.input}/datastage/${datastage.extractor.server}/omd_files |
datastage.sql.files.directory |
Path to the directory with SQL files used in database connectors |
${manta.dir.input}/datastage/${datastage.extractor.server}/sql_files |
datastage.odbc.connection.definition.file |
Path to the optional ODBC connection definition file |
${manta.dir.input}/datastage/${datastage.extractor.server}/connection_definition/odbcConnectionDefinition.ini |
datastage.odbc.connection.definition.encoding |
Encoding of the ODBC connection definition file. See Encodings for applicable values. |
UTF-8 |
datastage.unsupported.stages.connect.all |
Whether all input (output) columns without a corresponding output (input) column should be connected to all output (input) columns in unsupported stages |
true |
datastage.enable.oracle.proxy.users |
Whether Oracle proxy user authentication should be supported (true for changing Oracle usernames in "USERNAME[SCHEMA_OWNER]" format to "SCHEMA_OWNER", false otherwise). See https://docs.oracle.com/en/database/oracle/oracle-database/21/jjdbc/proxy-authentication.html#GUID-07E0AF7F-2C9A-42E9-8B99-F2716DC3B746 for more details. |
true |
datastage.analysis.amazon.s3.addressing.model |
Amazon S3 addressing model used during Amazon S3 stage analysis Possible values:
Format: https://s3.Region.amazonaws.com/bucket-name/key-name
Format: https://bucket-name.s3.Region.amazonaws.com/key-name |
VIRTUAL_HOSTED |
Manual Mapping File
The file is optional and is expected in the path defined by the common property datastage.manual.mapping.file
. The format of the CSV is:
"Full path to Stage";"Input Link name";"Input Column name";"Output Link name";"Output Column name";"Edge Type (DIRECT | FILTER)";"Description (optional)"
"manual_mapping_job/Generic_3";"DSLink2";"a";"DSLink5";"b";"DIRECT";""
The path to the stage is in the format
Job/[Shared and Local containers optional]/Stage
.
Parameter Override File
The file is optional and is expected in the path defined by the common property datastage.parameter.override.file
. This file overrides previously-loaded parameter values, so you can set them manually or add new ones.
The datastageParameterOverride.txt
File Format in Standalone DataStage
[ENVIRONMENT]
PARAM1_NAME = "param1_value"
PARAM2_NAME = "param2_value"
PARAM3_NAME = "param3_value"
[PARAMETER_SET/parameter_set_name]
param4_name = "default_param4_value"
param5_name = "default_param5_value"
$PARAM3_NAME = "$PROJDEF"
[VALUE_FILE/parameter_set_name/value_file1_name]
param4_name = "some_param4_value"
param5_name = "some_param5_value"
$PARAM3_NAME = "some_param3_value"
[VALUE_FILE/parameter_set_name/value_file2_name]
param4_name = "other_param4_value"
param5_name = "other_param5_value"
$PARAM3_NAME = "other_param3_value"
[JOB/job1_name]
param6_name = "param6_value"
param7_name = "param7_value"
[JOB/job2_name]
param8_name = "param8_value"
Four scopes of parameters can be added to the file.
-
Project/environment-level parameters — You can add any number of them under the
[ENVIRONMENT]
heading. Note that the values from the DSParam file сan be overridden. -
Parameter set default parameters — You can add any number of them under the
[PARAMETER_SET/parameter_set_name]
heading by specifying the name of the parameter set. You can also refer to some default environment parameters as in the example above, referring to the value$PROJDEF
. -
Parameter set value file parameters — You can add any number of them under the
[VALUE_FILE/parameter_set_name/value_file_name]
heading by specifying the name of the parameter set and value file. Referring to the value$PROJDEF
is also allowed here. Note: Don't forget to enter the names of the value files in thedatastage.value.files
property. -
Job parameters — You can add any number of them under the
[JOB/job_name]
heading by specifying the name of the job. Referring to the value$PROJDEF
is allowed.
Use correct file format. Spaces and tabs are allowed, but each scope definition and parameter entry must be on a separate line.
The datastageParameterOverride.txt
File Format in DataStage for Cloud Pak for Data
parameter_1 = value
parameter_2 = value
$environment_variable_1 = value
$environment_variable_2 = value
parameter_set_name.parameter_1 = value
parameter_set_name.parameter_2 = value
The order of the parameters in the file is not important. Spaces around
=
are optional.
Parameter Override Helper File
If during the analysis any unresolved parameters were found, a "helper" file, datastageParameterOverride.txt
, is generated in the location defined by the datastage.parameter.override.helper.file
property. The
hint that the helper file gives can also be found in the log file and in logs inside Admin UI Log Viewer relevant to the current DataStage dataflow analysis. After that, you need to perform the following steps.
-
Copy the helper file to the location defined by the
datastage.parameter.override
property. If such a file already exists in the destination folder, you can safely replace it.- You can also copy the contents of the helper file from the log and create the file in the above location. If such a file already exists, you can simply replace its contents.
-
Now you need to open the file and fill in the values of all unresolved parameters. Such parameters are empty and may look like this:
my_param = ""
. The parameter value must be enclosed in double quotes; for example,my_param = "my_value"
. -
Save and close the file.
-
Run the analysis again.
ODBC Connection Definition Settings
The connection settings for IBM DataStage are exported automatically by Manta Data Lineage during lineage analysis — but to resolve the ODBC connector, Manta Data Lineage needs more information about the relevant source/destination systems. They can be configured manually as follows.
-
Create or open the file referenced by the
datastage.odbc.connection.definition.file
property (e.g.,<MANTA_DIR_HOME>/input/datastage/${datastage.extractor.server}/connection_definition/odbcConnectionDefinition.ini
). -
Follow the instructions in Manually Define a Database Connection.