Filesystem Resource Configuration
A filesystem data file is represented in a dataflow graph as a
File
node in an appropriate Directory
node hierarchy and, finally, in a Server
node.
For example: C:/data/file.txt
will be represented as
localhost[Server] → C: [Directory] → data [Directory] → file.txt [File]
.
A cloud storage (Amazon S3) data file is represented in the dataflow graph as a File
node server in either:
-
A
Directory
hierarchy in aBucket
or -
Directly in a
Bucket
Examples:
-
A file located at
s3://my-bucket/myFile.txt
will be represented asmy-bucket [Bucket]
→myFile.txt [File]
. -
A file located at
http://my-bucket.s3.amazonaws.com/myDir/myFile.txt
will be represented asmy-bucket [Bucket]
→myDir [Directory]
→myFile.txt [File]
.
File Path Mapping Configuration
This configuration allows the mapping of filesystem paths to provide a more user-friendly representation and/or to ensure that the same file can be correctly referenced from different technologies, even if the technology itself references the file by a different file path (for example, if the hostnames are different due to the network configuration). The configuration is common to all source systems.
The mapping is configured in a standard CSV file with the following columns.
Column |
Description |
Examples / allowed values |
---|---|---|
Source Technology |
The source technology (source system type) this entry applies to. (See the complete list of technologies below.) If empty, the mapping entry may be applied to any technology. |
ORACLE |
Source Connection ID |
The connection ID of the connection this entry applies to (case sensitive). If empty, the mapping entry may be applied to any connection. |
oracle-uat |
Source Hostname |
Hostname this entry applies to (case-insensitive). If empty, the mapping entry may be applied to paths with any hostname. |
localhost |
Source Path Prefix |
Source path prefix (regular expression) to be matched. If this is non-empty, the entry may only be applied to file paths that start with the given prefix (disregarding the hostname). The matched path prefix may then be replaced by a different prefix during the mapping. |
C:/data |
Target Resource |
Graph resource to use for the resulting graph nodes. If empty, the original resource provided by the source system scanner will be used. |
FILESYSTEM |
Target Hostname |
Hostname to use for the resulting graph representation of the file. If empty, the original hostname provided by the source system scanner will be used. |
fileserver.int.example.com |
Target Path Prefix |
Path prefix to use for the resulting graph representation of the file. If this is non-empty, any path prefix matched by the source path prefix expression will be removed and the given target path prefix will be prepended to the remaining file path. |
data/dwh/stage |
The configuration can be changed in Configuration > CLI > Common > File Path Mapping in Manta Admin UI, as seen in the screenshot below.
Mapping Rules
-
The first matching record is used
-
Any condition that is left empty in the configuration is always met
-
The record matches if all four source conditions are met:
-
Source Technology
- Identification of the technology that referenced the file (e.g.,
ORACLE
,DATASTAGE
—see below)
- Identification of the technology that referenced the file (e.g.,
-
Source Connection ID
-
Identification of the connection where the file is used (the same as that defined when creating the connection)
-
Case sensitive
-
-
Source Hostname
-
Hostname of the file as used in the analyzed technology
-
Case insensitive
-
-
Source Path Prefix
-
Beginning of the file path to be matched and replaced (when mapping file paths)
-
Regular expression
-
Cannot be used only to remove a prefix, but can be used only to add a prefix
-
Case sensitivity is controlled by the
filepath.lowercase
property
-
-
-
Non-empty target values from the matched record are used to override the original attributes
-
Target Resource
-
Target Hostname
-
Target Path Prefix
- Replaces the section of the path matched by the source path prefix
-
Source Technology Values
Value to use | Technology |
---|---|
ANNOTATED_SCRIPT | Open Manta Annotated Script Scanner |
AWS_S3 | Amazon S3 — this is for the use-case where Amazon S3 resources need to be re-mapped as filesystem resources for backwards compatibility |
BIGQUERY | Google BigQuery |
BYTECODE | Bytecode |
COBOL | Cobol |
COGNOS | IBM Cognos Analytics, Cognos Business Intelligence |
CONCEPTUAL_OVERLAYS | Open Manta Conceptual Overlays |
DATABRICKS | Databricks |
DATASTAGE | IBM InfoSphere DataStage |
DB2 | IBM Db2 |
DIRECT_LINKS | Open Manta Direct Links |
ERSTUDIO | Idera ER/Studio Data Modeling Tools |
ERWIN | Erwin Data Modeler |
EXCEL | Microsoft/Azure Excel |
EXTENSIONS | Open Manta Extensions |
FIVETRAN | Fivetran |
HIVE | Apache Hive |
IFPC | Informatica PowerCenter |
INTERPOLATION | Open Manta Conceptual Overlays Lineage Interpolation |
JAVA | Java |
MICROSTRATEGY | MicroStrategy |
MSSQL | Microsoft SQL Server, Parallel Data Warehouse (PDW), Analytics Platform System (APS), Azure SQL Database, Azure SQL, Amazon RDS for SQL Server |
NETEZZA | IBM Netezza (IBM PureData System for Analytics) |
OBIEE | Oracle Business Intelligence Enterprise Edition |
ODI | Oracle Data Integrator |
OPENLINEAGE | OpenLineage |
ORACLE | Oracle Database, Exadata |
POSTGRESQL | PostgreSQL, Greenplum, Amazon Redshift, Yellowbrick, Amazon RDS, or Amazon Aurora for PostgreSQL |
POWERBI | Microsoft/Azure PowerBI |
POWERDESIGNER | SAP PowerDesigner |
QLIKSENSE | Qlik Sense |
SAPBO | SAP Business Objects |
SAS | Statistical Analysis Software |
SNOWFLAKE | Snowflake |
SSAS | Microsoft/Azure SQL Server Analytical Services |
SSIS | Microsoft/Azure SQL Server Integration Services |
SSRS | Microsoft/Azure SQL Server Reporting Services |
STREAMSETS | StreamSets |
TABLEAU | Tableau |
TALEND | Talend ETL |
TERADATA | Teradata Database, BTEQ, Teradata Parallel Transporter (TPT) |
MATILLION | Matillion ETL |
Examples
Scenario | Filesystem resource mapping | |||||
---|---|---|---|---|---|---|
Description | Original path | Desired path | Source hostname | Source path prefix | Target hostname | Target path prefix |
Move and rename directory (data) to a directory of the known Filesystem (BillingServer), keeping child directories the same | /Filesystem/localhost/data/files/ | /Filesystem/BillingServer/Audit/files/ | localhost | /data | BillingServer | /Audit |
Move and rename file to a directory of the known Filesystem (BillingServer) | /Filesystem/189.89.89.89/az_38x.txt | /Filesystem/BillingServer/Address/Arizona.txt | 189.89.89.89 | az_38x.txt (NO SLASH) | BillingServer | /Address/Arizona.txt |
Hide a drive in the path name | /Filesystem/ external_1/c:/users/sales/ | /Filesystem/external_1/users/sales | c: (NO SLASH) | |||
Rename a file consisting of REGEX special characters | /Filesystem/tmp/parameters/#paramset.$src_dir_parm. contract_address.dat | /Filesystem/tmp/parameters/contract_address.dat | #paramset.\src_dir_parm. contract_address.dat (‘\ to escape ‘$’) | contract_address.dat |