Manta Flow Scanner (Client) Configuration
Manta Flow Client Configuration
(Optional) Create the folders input, output, temp, and log; set up read rights for the input folder and read/write rights for the rest of the folders. You can skip this step if you want to use the default locations (listed below). The exact location of each of the folders is passed to the application by setting the bash/Windows command line environment variables. The names of these properties are shown in the following table.
Folder name | Property name | Default value |
---|---|---|
<MANTA_DIR_USER> |
MANTA_DIR_USER | <MANTA_DIR_HOME> |
input | MANTA_DIR_INPUT | MANTA_DIR_USER/input |
output | MANTA_DIR_OUTPUT | MANTA_DIR_USER/output |
log | MANTA_DIR_LOG | MANTA_DIR_USER/log |
temp | MANTA_DIR_TMP | MANTA_DIR_USER/temp |
The default configuration MANTA_DIR_USER
indicates that the Manta user directory is located in the root of the IBM Manta Data Lineage application directory by default.
Installing Third-Party Drivers
Depending on what source systems you want to analyze, you may need to install additional libraries to connect to some of them. The following table lists third-party drivers that can be used to connect Manta Data Lineage products to various source systems.
Name |
Version |
Link |
---|---|---|
Teradata JDBC Driver |
17.10.00.20 |
https://downloads.teradata.com/download/connectivity/jdbc-driver |
MySQL Connector/J |
5.1.38 |
|
MariaDB Connector/J |
3.2.0-GA |
https://mariadb.com/downloads/connectors/connectors-data-access/java8-connector/ |
AWS JDBC Driver for PostgreSQL |
0.1.0 |
|
Sybase / SAP ASE JDBC Driver |
3, 4 |
https://wiki.scn.sap.com/wiki/display/SYBCON/jConnect+Driver+Overview |
Hive JDBC |
2.1.0 |
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients |
SAP HANA JDBC Driver |
2.9.16 |
Download the library (a *.jar
file) that you want to use and add it to the <MANTA_DIR_HOME>/scenarios/manta-dataflow-cli/lib-ext/
folder.
Installing Custom Extensions
In rare cases where a customer-provided extension library needs to be added, this too should be placed in the
<MANTA_DIR_HOME>/scenarios/manta-dataflow-cli/lib-ext/
folder. If the extension library needs to be initialized sooner (typically logging extensions), it is possible to place it in the <MANTA_DIR_HOME>/platform/lib-ext/
folder. (You might have to create the folder because it does not exist by default). Do not place any extensions in the <MANTA_DIR_HOME>/scenarios/manta-dataflow-cli/lib/
or
<MANTA_DIR_HOME>/platform/lib/
folders—these might be cleaned during an application update and your extensions would be lost.
Resource Configuration
It is necessary to configure the Manta Flow client before its first launch and whenever the environment is changed.Based on the architecture described in Manta Flow Client Architecture, there are three kinds of configurations.
-
Specific to each source system (configurable via Admin UI / Connections / Technology / connection name )— credentials to the source system, encoding of the scripts, etc.
-
Common for all source systems (configurable via Admin UI / Configuration / CLI / Technology / Technology common for all scenarios for a particular source type)—common properties for the connection to databases, a directory for CSV file output, a connection to the internal metadata repository, etc.
To create a configuration for a new source system (Oracle or Teradata database, PowerCenter repository, etc.), create a new connection in Admin UI / CLI / Add connection. The following sections of this knowledge base include further descriptions of configurations by source type.
Encodings
When specifying encoding in Manta Data Lineage property files (the property name ends with .encoding
), refer to the column "Canonical Name for java.nio API" in the Java supported encodings (https://docs.oracle.com/en/java/javase/17/intl/supported-encodings.html#GUID-187BA718-195F-4C39-B0D5-F3FDF02C7205).
Placeholder Replacement in Input Scripts
All scanners that accept text-based input files—that is, all SQL-based scanners (input files provided according to Manta Flow Usage)—can utilize placeholder replacement rules to replace any placeholders and/or content that does not adhere to the dialect syntax with pieces of information that make the input valid script for parsing.
-
The rules are defined in the file
replace.csv
(by default). -
Each rule is written on a separate line (if the value is not enclosed in double quotes) in the following form. (The first line is also a rule, not a header.)
PLACEHOLDER,REPLACEMENT_VALUE[,SCOPE]
-
PLACEHOLDER
—identifies what will be replaced in the input file. -
REPLACEMENT_VALUE
—new value used in place ofPLACEHOLDER
. -
SCOPE
—optional column, filtering which input files this rule will be applicable to.
-
-
Each value should be quoted by
"
and any special characters should be escaped by\
.-
This means that the character
\
has to be written as\\
for plaintext values and as\\\\
for values using a regular expression. -
A new line can not be inserted using
\n
syntax but has to be written as a proper new line, and the whole segment has to be enclosed in double quotes.
Example:#TEMP,"TEMP Line 1 TEMP Line2"
-
-
All rules with the matching
SCOPE
are applied to each input file sequentially, top to bottom, as defined inreplace.csv
and applied to the result of the previous replacement. (If theSCOPE
is not defined then the rule is used for every input.) -
SCOPE
is always interpreted as a regular expression that should match the input file path with a root in the input directory (as printed in the logs asContext:
).- For example: If the path is
c:\mantaflow\cli\input\postgresql\MyConnection\MyBD\MySchema\MyScript.sql
, then theSCOPE
inreplace.csv
is\MyBD\MySchema\MyScript.sql
.
- For example: If the path is
-
When enabled (see the property
<technology>.script.replace.regex
for each scanner’s common configuration or as an override property in individual connections),PLACEHOLDER
/REPLACEMENT_VALUE
can be interpreted as a regular expression. See https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/regex/Pattern.html for more details.- It is possible to use group replacements (see Groups and
Capturing from the link above), where the defined group from the
PLACEHOLDER
is referenced in theREPLACEMENT_VALUE
by$NUM
whereNUM
is its order number from 1. For example: The rule for replacing the#TMP_
prefix with theFINAL_
would be"#TMP_([A-Za-z0-9_]*)","FINAL_$1"
.
- It is possible to use group replacements (see Groups and
Capturing from the link above), where the defined group from the
-
All files with a CSV extension in the specific scanner input directory are ignored (i.e., not parsed) as input scripts.
-
Detailed information about the location of the replacement file and enabling regular expression evaluation is included in the Common section on the resource configuration page for the individual scanner. The respective properties are
*.script.replace
and*.script.replace.regex
.
Transformation Logic
This module adds attribute-level transformation descriptions specific to a target (column, routine return value, etc.) on the dataflow nodes leading to it. This description contains only the logic needed to create a stored value for this specific target, without conditions.
Supported source systems:
-
Oracle—configurable using the
oracle.expressionDescriptions.enabled.*
property -
Microsoft SQL Server—configurable using the
mssql.expressionDescriptions.enabled.*
property -
Netezza—configurable using the
netezza.expressionDescriptions.enabled.*
property -
PostgreSQL—configurable using the
postgresql.expressionDescriptions.enabled.*
property -
Snowflake—configurable using the
snowflake.expressionDescriptions.enabled.*
property -
Teradata—configurable using the
teradata.expressionDescriptions.enabled.*
property