Snowflake Resource Configuration

Before you configure your scanner, make sure you meet the prerequisites. Read our guide on Snowflake integration requirements to double-check.

Source System Properties

This configuration can be setup by creating a new connection on Admin UI / Connections tab or editing an existing connection in Admin UI / Connections / Databases / Snowflake / specific connection. New connection can also be created via Manta Orchestration API.

Granularity of the IBM Automatic Data Lineage connection for Snowflake is one Snowflake server. Use filter on databases and schemas to limit the scope of analysis as needed. Use of multiple connections against a single Snowflake server may lead to within-system lineage not be connected properly.

Property name

Description

Example

snowflake.dictionary.id

Name of a resource representing this Snowflake server known as the dictionary ID, used as an output subdirectory name for extracted DDL files and the database dictionary

snowflake

snowflake.url

JDBC connection string for the database server with the warehouse and role specified; the role has to have sufficient privileges as per Snowflake Integration Requirements

jdbc:snowflake://manta.us-east-1.snowflakecomputing.com/?warehouse=DEMO_WH&role=SELECT_CATALOG_ROLE

snowflake.extraction.type

Extraction type used for extracting metadata from the Snowflake server

INFORMATION_SCHEMA or ACCOUNT_USAGE

snowflake.extraction.method

Set to Agent:default when the desired extraction method is the default Manta Extractor Agent, set to Agent:{remote_agent_name} when a remote Agent is the desired extraction method, set to Git:{git.dictionary.id} when the Git ingest method is the desired extraction method. For more information on setting up a remote extractor Agent please refer to the Manta Flow Agent Configuration for Extraction documentation. For additional details on configuring a Git ingest method, please refer to the Manta Flow Agent Configuration for Extraction:Git Source documentation.

default

Git

agent

snowflake.username

User name for connection to the database server

manta_user

snowflake.password

User password for connection to the database server

Leave empty if you use key pair authentication for the Snowflake connection

manta_password

snowflake.privatekeyfile

File path containing the private key for key pair authentication

Complete this field to use key pair authentication for the Snowflake connection

C:\Users\TestUser\Desktop\authentication\keypairfile

snowflake.privatekeyfilepwd

Password to the file containing the private key for key pair authentication

Only complete if you use key pair authentication for the Snowflake connection

private_key_file_password

snowflake.tablestage

Regular expression of all table stages from which staged files should be extracted

The format of the regular expression is the fully qualified name of the table stage where each segment is encapsulated in double quotes

Leave empty if no staged files should be extracted from any table stages

\"mydb\"\\.\"schema1\"\\.\".*\"|\"mydb\"\\.\"myschema\"\\.\"abc.*\"

snowflake.extractedDbsSchemas

List of databases and schemas to extract, separated by commas, which are to be provided in the format database/schema; each part is evaluated as a regular expression

database1/sch1,database2/sch2,database3

snowflake.excludedDbsSchemas

List of databases and schemas to exclude from extraction, separated by commas

database3/sch3,database3/sch4

snowflake.ddl.encoding

Encoding of automatically extracted DDL scripts. See Encodings for applicable values.

utf8

snowflake.script.encoding

Encoding of manually provided SQL scripts performed on this database server. See Encodings for applicable values.

utf8

Common Scanner Properties

This configuration is common for all Snowflake source systems and for all Snowflake scenarios, and is configure in Admin UI / Configuration / CLI / Snowflake / Snowflake Common. It can be overridden on individual connection level.

Property name

Description

Example

snowflake.dictionary.dir

Directory with data dictionaries extracted from Snowflake

${manta.dir.temp}/snowflake

filepath.lowercase

Whether paths to files should be lowercase (false for case-sensitive file systems, true otherwise)

false
true

snowflake.dll.output

Directory for automatically extracted Snowflake DDL scripts (for the extraction phase)

${manta.dir.temp}/snowflake/${snowflake.dictionary.id}/ddl

snowflake.ddl.input

Directory with automatically extracted Snowflake DDL scripts (for the analysis phase)

${snowflake.dll.output}

snowflake.script.input

Directory with manually provided SQL scripts which are performed on a given database server (for the analysis phase)

${manta.dir.input}/snowflake/${snowflake.dictionary.id}/

snowflake.script.replace

Path to the CSV file with the replacements to be applied to the provided SQL scripts; see Placeholder Replacement in Input Scripts for details about the replacement file format

${manta.dir.input}/snowflake/${snowflake.dictionary.id}/replace.csv

snowflake.script.replace.regex

Flag specifying whether replacements for SQL scripts in the provided CSV file specified in snowflake.script.replace should be interpreted as regular expressions (true) or simple text (false)

false
true

snowflake.analyze.parallelCount

Number of parallel threads which will analyze DDL and SQL scripts

4

snowflake.dictionary.mappingFile

Path to automatically generated mappings for Snowflake servers

${manta.dir.temp}/snowflake/snowflakeDictionaryMantaMapping.csv

snowflake.dictionary.mappingManualFile

Path to mappings provided manually for Snowflake servers

${manta.dir.scenario}/conf/snowflakeDictionaryMantaMappingManual.csv

snowflake.searchPath

Schema lookup path used by the Snowflake server; default value: $current,$public

$current,$public, db1.sch1

snowflake.extractor.numThreads

Number of threads performing queries to Snowflake databases

4

snowflake.extractor.persistence.pipelineSize

Maximum number of deserialized objects stored in the persistence buffer waiting to be written into the dictionary; restricts memory footprint

1000

snowflake.analyze.retainUnusedResultSetColumns

Flag specifying whether the data lineage should include sub-query resultset columns that do not have any downstream lineage

By default, set to false

false
true

snowflake.extractor.liststages

Enables or disables extraction (listing of stage contents) for extracted stages;

By default, set to true, meaning stage contents (files) will be listed and will participate in lineage.

Setting to false means stages will not be listed, and individual files under stages will not participate in lineage.

As of R42.4:

The maximum number of files extracted per stage is set to 100. If a stage contains more files, the stage is extracted but the files that this stage contains are not extracted. This limit is applicable to external stages, and selected table stages.

false

true

snowflake.expressionDescriptions.enabled.ddl

Enables or disables building transformation logic descriptions from SQL code in DDL scripts

false
true

snowflake.expressionDescriptions.enabled.script

Enables or disables building transformation logic descriptions from SQL code in SQL scripts

false
true

snowflake.expressionDescriptions.enabled.queryService

Enables or disables building transformation logic descriptions from Snowflake SQL code defined in other technologies

false
true