Amazon RDS for PostgreSQL lineage configuration
To import lineage metadata from Amazon RDS for PostgreSQL, create a connection, data source definition and metadata import job.
Amazon RDS for PostgreSQL is a PostgreSQL relational database that runs on the Amazon Relational Database Service (RDS).
Processed metadata
The following Amazon RDS for PostgreSQL metadata is processed and displayed on lineage:
- Data dictionaries
- Scripts
- Views
- Functions (SQL and PL/pgSQL)
- Custom types
Limitations
The following data is not processed:
- Assets that use dynamically executed code through the EXECUTE IMMEDIATE statement.
- EDB assets for which the source code is obfuscated by using the EDB*Wrap utility. You can provide an unobfuscated source code manually as the PostgreSQL input file. For details, see Obfuscating source code.
Prerequisite configuration
Before you import lineage metadata, ensure that the following prerequisites are met:
- You must have the
CONNECTrights to connect to each extracted database. - You must have the
SELECTrights on the following tables and views inpg_catalog. These rights are included in the Public role.- pg_database
- pg_namespace
- pg_class
- pg_attribute
- pg_type
- pg_proc
- pg_language
- pg_description
Creating a metadata import asset
Data source connection
To connect to the data source from which you want to import lineage metadata, you need to select a data source definition and a connection. You can create them before you start creating the metadata import, or you can create them when you create and configure the metadata import asset.
Data source definition
Select PostgreSQL as the data source type.
Connection
For connection details, see Amazon RDS for PostgreSQL connection.
Connection mode
You can connect to Amazon RDS for PostgreSQL by using one of the following connection modes:
- Direct connection
- Remote connection with a Manta agent. When an agent is configured, select it from the list. For more information, see Configuring agents for lineage metadata import.
Include and exclude lists
You can include or exclude assets up to the schema level. Provide databases and schemas in the format database/schema. Each part is evaluated as a regular expression. Assets which are added later in the data source will also be included or excluded if they match the conditions specified in the lists. Example values:
myDB/: all schemas inmyDBdatabase.myDB2/.*: all schemas inmyDB2database.myDB3/mySchema1:mySchema1schema frommyDB3database.myDB4/mySchema[1-5]: any schema in mymyDB4database with a name that starts withmySchemaand ends with a digit between 1 and 5.
External inputs
If you use external SQL scripts for Amazon RDS for PostgreSQL, you can add them in a .zip file as an external input. You can organize the structure of a .zip file as subfolders that represent databases and schemas. After the scripts are scanned, they are added under respective databases and schemas in the selected catalog or project. The .zip file can have the following structure:
<database_name>
<schema_name>
<script_name.sql>
<database_name>
<script_name.sql>
<script_name.sql>
replace.csv
The replace.csv file contains placeholder replacements for the scripts that are added in the .zip file. For more information about the format, see Placeholder replacements.
Advanced import options
- Extract extended attributes
- You can extract extended attributes like primary key, unique and referential integrity constraints of columns. By default these attributes are not extracted.
- Extraction mode
- You can decide which extraction mode to run for the imported metadata. You have the following options:
- Prefetch: use it for relational databases.
- Parallel bulk: use it for analytical processing engines.
- Single-thread: use it to avoid parallelism and large queries during extraction. When you select this mode, performance might be low.
- Transformation logic extraction
- You can enable building transformation logic descriptions from SQL code in SQL scripts.