Snowflake Scanner Guide
Follow these steps to configure a connection to Snowflake to analyze lineage through SQL code ( https://docs.snowflake.com/en/sql-reference-commands, https://docs.snowflake.com/sql-reference-functions, and https://docs.snowflake.com/sql-reference-snowflake-scripting) within Snowflake. The IBM Automatic Data Lineage lineage scanner does not support lineage analysis through Snowflake code written in Java, Javascript, Python, or Scala.
Step 1: Preparation
To connect to a Snowflake cloud instance, you will need:
-
your Snowflake DBA/Administrator to provide a user access with privileges as per Snowflake Integration Requirements
-
your Snowflake DBA/Administrator and/or network team to open firewall access between the Automatic Data Lineage machine and Snowflake instance
-
to review the other prerequisites as per Snowflake Integration Requirements
-
to make a list of databases/schemas that you are interested in having covered with lineage
Step 2: Configure the Connection
Create a new connection in Admin UI http://localhost:8281/manta-admin-gui/app/index.html?#/platform/connections/
to enable automated extraction and lineage analysis of Snowflake by Automatic Data Lineage. You should create one connection
for the each Snowflake warehouse.
Properties that must be configured:
-
Connection information for the Snowflake instance:
snowflake.dictionary.id
,snowflake.extraction.type
,snowflake.url
,snowflake.username
Thesnowflake.url
is in the formatjdbc:snowflake://<account_name>.<region>.snowflakecomputing.com/?warehouse=<WAREHOUSE_NAME>&role=<DEFAULT_ROLE_NAME>
(for more details, see https://docs.snowflake.com/en/developer-guide/jdbc/jdbc-configure) where:-
<ACCOUNTNAME>.<REGION>
(highlighted in green) can be found by looking at your Snowflake instance URL -
<WAREHOUSE>
(highlighted in blue) can be found under Warehouses in your Snowflake console -
<DEFAULT_ROLE_NAME>
can be obtained from your Snowflake administrator
-
-
Depending on the authentication method you want to use, configure either
snowflake.password
orsnowflake.privatekeyfile
.snowflake.privatekeyfilepwd
must be set.
Optional properties:
-
To control the scope of the extraction, use
snowflake.extractedDbsSchemas
,snowflake.excludedDbsSchemas
-
To enable extraction of table stages, configure the pattern matching the ones to be extracted in
snowflake.tableStage
. Note that if the stage contains 1000k+ files (usually representing micro-batches organized by timestamps), it may not be a good idea to include those because the extraction will take long time and generate lots of filesystem entries that will not be very useful for lineage investigation.
Step 3: Provide External SQL Scripts for Lineage Analysis (Optional)
Automatic Data Lineage automatically extracts and analyzes lineage for the code (e.g., tables, views, functions, procedures, tasks) stored directly in Snowflake. If you are storing scripts outside Snowflake and only execute them against Snowflake, you can provide them to the Automatic Data Lineage scanner as per Snowflake Manual Inputs.