How to Handle Snowflake Javascript Stored Procedures
Overview
At the beginning Snowflake supported only Javascript for writing stored procedures. IBM Automatic Data Lineage does not support lineage analysis of Javascript stored procedures; only SQL based stored procedures.
However, in cases when Javascript is used simply as a wrapper for calling static SQL statements without using Javascript for any further complex logic, it is possible to extract SQL/remove the javascript wrapping and pass it to Automatic Data Lineage for lineage analysis.
The solution is based on Automatic Data Lineage OOTB replace.csv capability (Manta Flow Scanner (Client) Configuration: Placeholder Replacement in Input Scripts) to preprocess the Javascript stored procedures to reomove the Javascript wrappers.
Note that this is a best effort only and highly dependent on consistency of coding practices for the javascript-stripping to work. It needs to be adjusted to your specific coding practices.
More info
What type of Javascript code is a good fit for this solution? Code similar to the following that executes static SQL (line 28); any dynamically created SQL (line 40) is not interesting from lineage perspective (in this case just some sort of logging):
CREATE OR REPLACE PROCEDURE DBO.SP_LOAD_DBO_MBR_SRC_DIM_TECH()
RETURNS VARCHAR
LANGUAGE JAVASCRIPT
AS
$$
var sqlStmnt;
var catchsql;
var procedureName="SP_LOAD_DBO_MBR_SRC_DIM_TECH";
try
{
function executeQuery(p_sql_stmt){
const stmt = snowflake.createStatement({sqlText: p_sql_stmt});
const rs = stmt.execute();
var ret;
if (rs.next()) {
ret = rs.getColumnValue(1);
}
return ret;
}
sqlStmnt =
`TRUNCATE TABLE WORK.MBR_SRC_DIM_TECH`;
executeQuery(sqlStmnt);
sqlStmnt =
`INSERT INTO WORK.MBR_SRC_DIM_TECH
(MBR_SRC_DUR_KEY
,MBR_DUR_KEY
SELECT skl.MBR_SRC_DUR_KEY
,UPPER(TRIM(psa.MBR_DUR_KEY)
FROM PSA.CTL_DBO_TECH_MBR psa
JOIN DBO.DW_MBR_SOURCE_KEY_LOOKUP skl ON skl.DW_DATA_SOURCE = psa.DW_DATA_SOURCE)`;
executeQuery(sqlStmnt);
}
catch(err) {
catchsql="CALL SUPPORT.SP_UPDATE_SUPPORT_EXECUTION_LOG_FAILED("+"'"+"CDW"+"'"+","+"'"+"DBO"+"'"+","+"'"+procedureName+"'"+","+"'"+
"Failed"+"'"+","+"'"+err.message.replace(/[^a-zA-Z ]/g, "")+"'"+")"
const stmt1 = snowflake.createStatement({sqlText: catchsql});
const rs = stmt1.execute();
return "Failed"
}
return "SUCCESS";
$$
;
The idea for the solution is to handle the preprocessing using the following patterns
-
replace
LANGUAGE JAVASCRIPT AS $$
withLANGUAGE SQL AS BEGIN
to enable lineage analysis by Automatic Data Lineage -
replace
sqlStmnt = `<SQL Query>`; executeQuery(sqlStmnt);
with<SQL Query>
to unwrap the SQL code for lineage analysis -
Do additional cleanup to prevent unnecessary parsing errors that would just flood the logs and make it harder to investigate the errors.
-
replace everything starting with
catch(err)
withEND
(expecting that there’s just one block like that) -
remove all variable definitions
var <variable name>
andvar <variable name> = assignment
-
remove the
executeQuery
function
-
replace.csv
for the code above can look as (note the blank lines in the replacements to try to match the original line numbers so that navigation in the source code (which displays the original javascript) in Automatic Data Lineage
works well.)
(?i)var\\s*\\w*(\\s*=\\s*[^;]*)?\\s*;,
(?is)language\\s*javascript.*\\s*as\\s*[$][$],"language sql
as
BEGIN
"
(?is)try\\s*[{]\\s*function\\s*executeQuery.*?return\\s*\\w*;\\s*[}],"
"
(?is)sqlStmnt\\s*=\\s*((//[^\\n]*\\s*)*)\\s*`([^`]+?)`\\s*;?\\s*((//[^\\n]*\\s*)*)\\s*executeQuery\\s*[(]\\s*sqlStmnt\\s*[)]\\s*,"$1$3$4
"
(?is)[}]\\s*catch\\s*[(]\\s*err\\s*[)].*[$][$],END
Solution
-
Place all your Javascript stored procedures into
mantaflow/cli/input/snowflake/<snowflake.dictionary.id>/<database>/<schema>/
based on the database and schema where they are defined.
Automatic Data Lineage currently does not extract the javascript stored procedures (as it cannot process them anyway. So this step needs to be performed outside of Automatic Data Lineage) -
Enable evaluation of replacements as regular expressions in Admin UI > Connections > Databases > Snowflake > \
> Evaluate replacements as regular expressions set to True -
Prepare the replacements based on your coding practices using the attached spreadsheet.
Spreadsheet automatically replaces any space with any sequence of whitespace characters (to make the matching less dependent on the code structure) and properly escapes the backslashes
Add more/adjust existing patterns as needed to match your coding practices -
Create
replace.csv
file-
Copy contents of column D&E into
mantaflow/cli/input/snowflake/<snowflake.dictionary.id>/replace.csv
-
replace tab character with comma
,
(MS Excel by default delimits columns by tabs instead of commas as required by Automatic Data Lineagereplace.csv
syntax)
-
-
Run the lineage analysis for Snowflake SQL
-
Review the resulting lineage and potential parsing errors, and iterate from step #4 to finetune the replacements based on your coding practices