How to Handle Snowflake Javascript Stored Procedures

Overview

At the beginning Snowflake supported only Javascript for writing stored procedures. IBM Automatic Data Lineage does not support lineage analysis of Javascript stored procedures; only SQL based stored procedures.

However, in cases when Javascript is used simply as a wrapper for calling static SQL statements without using Javascript for any further complex logic, it is possible to extract SQL/remove the javascript wrapping and pass it to Automatic Data Lineage for lineage analysis.

The solution is based on Automatic Data Lineage OOTB replace.csv capability (Manta Flow Scanner (Client) Configuration: Placeholder Replacement in Input Scripts) to preprocess the Javascript stored procedures to reomove the Javascript wrappers.

Note that this is a best effort only and highly dependent on consistency of coding practices for the javascript-stripping to work. It needs to be adjusted to your specific coding practices.

More info

What type of Javascript code is a good fit for this solution? Code similar to the following that executes static SQL (line 28); any dynamically created SQL (line 40) is not interesting from lineage perspective (in this case just some sort of logging):

CREATE OR REPLACE PROCEDURE DBO.SP_LOAD_DBO_MBR_SRC_DIM_TECH()
  RETURNS VARCHAR 
  LANGUAGE JAVASCRIPT
  AS
  $$
   var sqlStmnt;
  var catchsql;

 var procedureName="SP_LOAD_DBO_MBR_SRC_DIM_TECH";
  try
{
    function executeQuery(p_sql_stmt){
      const stmt = snowflake.createStatement({sqlText: p_sql_stmt});
      const rs = stmt.execute();
      var ret;
      if (rs.next()) {
        ret = rs.getColumnValue(1);
      }
      return ret;
    }

  sqlStmnt = 
    `TRUNCATE TABLE WORK.MBR_SRC_DIM_TECH`;

  executeQuery(sqlStmnt); 

  sqlStmnt = 
        `INSERT INTO WORK.MBR_SRC_DIM_TECH
            (MBR_SRC_DUR_KEY
            ,MBR_DUR_KEY
        SELECT skl.MBR_SRC_DUR_KEY
                ,UPPER(TRIM(psa.MBR_DUR_KEY)
            FROM PSA.CTL_DBO_TECH_MBR psa   
            JOIN DBO.DW_MBR_SOURCE_KEY_LOOKUP skl ON skl.DW_DATA_SOURCE = psa.DW_DATA_SOURCE)`;

  executeQuery(sqlStmnt);

  }
catch(err) {
      catchsql="CALL SUPPORT.SP_UPDATE_SUPPORT_EXECUTION_LOG_FAILED("+"'"+"CDW"+"'"+","+"'"+"DBO"+"'"+","+"'"+procedureName+"'"+","+"'"+
           "Failed"+"'"+","+"'"+err.message.replace(/[^a-zA-Z ]/g, "")+"'"+")"
      const stmt1 = snowflake.createStatement({sqlText: catchsql});
    const rs = stmt1.execute();
         return "Failed"
     } 
  return "SUCCESS";
  $$
  ;

The idea for the solution is to handle the preprocessing using the following patterns

  1. replace LANGUAGE JAVASCRIPT AS $$ with LANGUAGE SQL AS BEGIN to enable lineage analysis by Automatic Data Lineage

  2. replace sqlStmnt = `<SQL Query>`; executeQuery(sqlStmnt); with <SQL Query> to unwrap the SQL code for lineage analysis

  3. Do additional cleanup to prevent unnecessary parsing errors that would just flood the logs and make it harder to investigate the errors.

    1. replace everything starting with catch(err) with END (expecting that there’s just one block like that)

    2. remove all variable definitions var <variable name> and var <variable name> = assignment

    3. remove the executeQuery function

replace.csv for the code above can look as (note the blank lines in the replacements to try to match the original line numbers so that navigation in the source code (which displays the original javascript) in Automatic Data Lineage works well.)

(?i)var\\s*\\w*(\\s*=\\s*[^;]*)?\\s*;,
(?is)language\\s*javascript.*\\s*as\\s*[$][$],"language sql
as
BEGIN
"
(?is)try\\s*[{]\\s*function\\s*executeQuery.*?return\\s*\\w*;\\s*[}],"





"
(?is)sqlStmnt\\s*=\\s*((//[^\\n]*\\s*)*)\\s*`([^`]+?)`\\s*;?\\s*((//[^\\n]*\\s*)*)\\s*executeQuery\\s*[(]\\s*sqlStmnt\\s*[)]\\s*,"$1$3$4



"
(?is)[}]\\s*catch\\s*[(]\\s*err\\s*[)].*[$][$],END

Solution

  1. Place all your Javascript stored procedures into mantaflow/cli/input/snowflake/<snowflake.dictionary.id>/<database>/<schema>/ based on the database and schema where they are defined.
    Automatic Data Lineage currently does not extract the javascript stored procedures (as it cannot process them anyway. So this step needs to be performed outside of Automatic Data Lineage)

  2. Enable evaluation of replacements as regular expressions in Admin UI > Connections > Databases > Snowflake > \ > Evaluate replacements as regular expressions set to True

  3. Prepare the replacements based on your coding practices using the attached spreadsheet.
    Spreadsheet automatically replaces any space with any sequence of whitespace characters (to make the matching less dependent on the code structure) and properly escapes the backslashes
    Add more/adjust existing patterns as needed to match your coding practices

  4. Create replace.csv file

    1. Copy contents of column D&E into mantaflow/cli/input/snowflake/<snowflake.dictionary.id>/replace.csv

    2. replace tab character with comma , (MS Excel by default delimits columns by tabs instead of commas as required by Automatic Data Lineage replace.csv syntax)

  5. Run the lineage analysis for Snowflake SQL

  6. Review the resulting lineage and potential parsing errors, and iterate from step #4 to finetune the replacements based on your coding practices

Javascript-to-sql_replace.csv_generator.xlsx