How to Handle Dynamic SQL Scripts

Overview

Dynamically generated SQL can be seen in simple cases as a work-around for not being able to create a table when storing procedures to complex metadata-driven solutions. While most Automatic Data Lineage scanners don’t support dynamic SQL queries out-of-the-box, it is typically possible to get lineage for dynamic queries with a bit of additional effort. In this how-to article, we will summarize the approaches used in most situations when it is necessary to harvest lineage for dynamic SQL.

Not All Dynamic SQL Is Equal

Technically, dynamic SQL is any SQL script executed via the EXECUTE IMMEDIATE command (or a similar command depending on the database vendor). However, when one looks deeper, the level of dynamicity can range from “almost” static to heavily dynamic and metadata driven. Different approaches may be suitable for particular code types or levels of dynamicity. Here are a few examples to illustrate different levels of dynamicity.

  1. Static code executed dynamically — typically only used to work around limitations of the language; for example, an inability to create a table in an Oracle Function

    FUNCTION update_temp()
    RETURN VARCHAR2
    IS
    v_sql VARCHAR2(4000);
    BEGIN
       v_sql := 'CREATE TABLE temp_stage2 NOLOGGING ' ||
                'AS ' ||
                'SELECT temp_stage1.*, ' ||
                '       loc.short_name bltitm_location, ' ||
                '       loc.locf_function bltitm_location_function '  ||
                'FROM   temp_stage1, ' ||
                '       edw.locations loc ' ||
                'WHERE  loc.id = temp_stage1.loc_id_stored ';
    
       EXECUTE IMMEDIATE v_sql;
       -- do something
       pkg_ddl.drop_table('TEMP_STAGE2',FALSE);
    END;
    
  2. “Almost” static code executed dynamically

    FUNCTION update_temp(in_status VARCHAR2, in_abctmp_tablespace_name VARCHAR2)
    RETURN VARCHAR2
    IS
    v_sql VARCHAR2(4000);
    BEGIN
       v_sql := 'CREATE TABLE temp_stage2 NOLOGGING ' ||
                'TABLESPACE ' || in_abctmp_tablespace_name || ' ' ||
                'AS ' ||
                'SELECT temp_stage1.*, ' ||
                '       loc.short_name bltitm_location, ' ||
                '       loc.locf_function bltitm_location_function '  ||
                'FROM   temp_stage1, ' ||
                '       edw.locations loc ' ||
                'WHERE  loc.id = temp_stage1.loc_id_stored ' ||
                '       and loc.status = ''' || in_status || '''';
    
       EXECUTE IMMEDIATE v_sql;
       -- do something
       pkg_ddl.drop_table('TEMP_STAGE2',FALSE);
    END;
    
  3. Metadata-driven approach with dynamic SQL resulting in a single query

       SELECT source_schema, target_schema, table_name, condition INTO vSrcSchema, vTgtSchema, vTblName, vCondition FROM JobConfiguration WHERE status = 'enabled' and job_name = 'X';
       v_sql = 'INSERT INTO ' || vTgtSchema || '.' || vTblName || ' SELECT * FROM ' || vSrcSchema || '.' || vTblName || ' WHERE ' || vCondition;
       EXECUTE IMMEDIATE v_sql;
    
  4. Metadata-driven approach with looping — dynamic SQL resulting in multiple queries

       -- read configuration of the job from the configuration table
       OPEN c_JobConfiguration FOR 'SELECT source_schema, target_schema, table_name, condition INTO vSrcSchema, vTgtSchema, vTblName, vCondition FROM JobConfiguration WHERE status = ''enabled''';
    
       -- process all the loaded records one by one
       LOOP
          FETCH c_JobConfiguration INTO r_JobConfiguration;
          EXIT WHEN c_JobConfiguration%NOTFOUND;
          -- (dynamically) generate a SQL statement to creta a copy of the table
          v_sql = 'INSERT INTO ' || r_JobConfiguration.vTgtSchema || '.' || r_JobConfiguration.vTblName || ' SELECT * FROM ' || r_JobConfiguration.vSrcSchema || '.' || r_JobConfiguration.vTblName || ' WHERE ' || r_JobConfiguration.vCondition;
          -- run the generated SQL
          EXECUTE IMMEDIATE v_sql;
       END LOOP
    
  5. Capturing the output of dynamically executed SQL for further use

       TYPE T IS TABLE OF Customer%TYPE;
       MyRows T;
       SELECT source_schema, target_schema, condition INTO vSrcSchema, vTgtSchema, vCondition FROM JobConfiguration WHERE status = 'enabled' and job_name = 'X';
       v_sql = 'SELECT * FROM ' || vSrcSchema || '.Customer ' WHERE ' || vCondition;
       EXECUTE IMMEDIATE v_sql BULK COLLECT INTO MyRows;
    
       FOR indx IN 1 .. MyRows.COUNT
         LOOP
           -- do something with MyRows in a row-by-row manner
         END LOOP;
       FORALL indx I
    

Each of these cases may benefit from a different approach to extracting lineage from the dynamically generated statements. Here are a few approaches that can be used depending on the situation and what fits. The main idea is to convert the dynamic SQL into static queries that can be analyzed by Automatic Data Lineage out-of-the-box or to generate lineage information along with the dynamic query.

Approach #1: Use Automatic Data Lineage Automated Dynamic SQL Processing Feature

The Automatic Data Lineage scanner for Oracle supports the handling of some dynamic SQL patterns. The scanner simply constructs the SQL statement based on the information available in the script itself. Upon encountering an EXECUTE IMMEDIATE statement, it attempts to analyze the lineage for it. As it only uses the information available in the script, the lineage may not be 100% accurate or it may not be possible to process it at all. The scripts that can be processed are something like the one below.

FUNCTION update_temp(in_abctmp_tablespace_name VARCHAR2)
RETURN VARCHAR2
IS
v_start_time  DATE;
v_rtntxt VARCHAR2(4000);
v_sql VARCHAR2(4000);
BEGIN
   v_start_time := SYSDATE;

   v_sql := 'CREATE TABLE temp_stage2 NOLOGGING ' ||
            'TABLESPACE ' || in_abctmp_tablespace_name || ' ' ||
            'AS ' ||
            'SELECT temp_stage1.*, ' ||
            '       loc.short_name bltitm_location, ' ||
            '       loc.locf_function bltitm_location_function '  ||
            'FROM   temp_stage1, ' ||
            '       edw.locations loc ' ||
            'WHERE  loc.id = temp_stage1.loc_id_stored ';

   EXECUTE IMMEDIATE v_sql;
   pkg_ddl.drop_table('TEMP_STAGE1',FALSE);
END;

The script above uses dynamic SQL as there is a need to specify (on-demand) the tablespace where the table is to be stored. SQL language does not allow this without the use of dynamic SQL; otherwise, the compilation fails. Dynamically passing the statements overcomes this limitation. This type of dynamic code, where an almost complete SQL is present, can be successfully analyzed by the dynamic SQL processing feature of Automatic Data Lineage Oracle scanner.

As soon as the dynamically generated code gets more complex, especially when it reads the configuration in the database, more than just static processing is needed to do the trick. Let’s use the following piece of code as an example.

FUNCTION process_job(in_job_name VARCHAR2)
RETURN VARCHAR2
IS
DECLARE
   vSrcSchema VARCHAR2;
   vTgtSchema VARCHAR2;
   vTblName VARCHAR2;
   vCondition VARCHAR2;

BEGIN
   -- read configuration of the job from the configuration table
   SELECT source_schema, target_schema, table_name, condition INTO vSrcSchema, vTgtSchema, vTblName, vCondition FROM JobConfiguration WHERE status = 'enabled' and job_name = in_job_name;
   -- (dynamically) generate a SQL statement to creta a copy of the table
   v_sql = 'INSERT INTO ' || vTgtSchema || '.' || vTblName || ' SELECT * FROM ' || vSrcSchema || '.' || vTblName || ' WHERE ' || vCondition;
   -- run the generated SQL
   EXECUTE IMMEDIATE v_sql;
   ...
END

The script simply creates a copy of a selected table in a source schema in a target schema. The script itself, however, does not contain any information about what the source and target schemas are or the actual table name. All this information is read dynamically from the database (line 12) when the job runs. As such, Automatic Data Lineage cannot process it on its own and needs additional user input to handle it.

Approach #2: Use a Different Approach

Sometimes, it is easier to generate the lineage information containing the sources and targets of the dynamic SQL code than it is to capture and analyze the generated code itself. The Custom Metadata Import Module can be used to import the lineage in this format. This is especially useful for metadata-driven approaches, where the metadata itself makes up the main pieces of information and is used to generate the SQL. Why not generate lineage information based directly on the same metadata as well? This way, handling dynamic SQL can be entirely avoided.

Approach #3: Print the Generated SQL Statement

This is the most common and, generally, also the easiest way to get lineage for dynamic SQL code. Following the simple case in the examples above, the variable v_sql holds the generated SQL. The code can be modified to print the actual generated SQL statement to a specific file or store it in a table in the database. These generated SQL statements can be collected and passed to Automatic Data Lineage for processing.

It is usually very easy to adjust the implementation so that before it executes the generated code (stored in the variable/table) it prints the code to a file. This file can later be provided as input to Automatic Data Lineage (see Manta Flow Usage: Preparing Scanner Inputs).

Approach #4: Query Log

In some cases, it is not possible/desirable to modify the actual code, even by adding an additional logging statement. The reason can be that it is not possible/desirable to adjust the implementation or that the queries are generated and executed externally. In this case, you can leverage the database query log, acquire the queries that have actually been executed from the log, and provide them as inputs to Automatic Data Lineage, the same way as in the previous scenario (see Manta Flow Usage: Preparing Scanner Inputs).

Note: Use this approach when query logs include resolved static SQL code that was run on the server. When query logs include dynamic SQL code that was run on the database, this approach might not be effective.

Approach #5: Conversion of Dynamic SQL to Static Queries

There are cases where none of the aforementioned approaches provide complete results. This is especially true when the data returned by the dynamic query is not stored in a database table and is referenced later in the script in, for example, something like the following fragment with BULK COLLECT capturing the result of EXECUTE IMMEDIATE.

   TYPE T IS TABLE OF Customer%TYPE;
   MyRows T;
   SELECT source_schema, target_schema, condition INTO vSrcSchema, vTgtSchema, vCondition FROM JobConfiguration WHERE status = 'enabled' and job_name = 'X';
   v_sql = 'SELECT * FROM ' || vSrcSchema || '.Customer ' WHERE ' || vCondition;
   EXECUTE IMMEDIATE v_sql BULK COLLECT INTO MyRows;

   FOR indx IN 1 .. MyRows.COUNT
     LOOP
        -- do something with MyRows in a row-by-row manner
     END LOOP;
   FORALL indx I

In such cases, you will need to convert the dynamic query into a static one that has actually been executed. Here are the main steps you will take to perform such a conversion.

  1. Run the Automatic Data Lineage extraction script.

  2. In each script that uses dynamic SQL downloaded by Automatic Data Lineage into the mantaflow/cli/temp/ directory, replace the dynamic SQL with the static version of the generated statement. You can take the static code from approach #1 or #2.

  3. Run the Automatic Data Lineage analysis script.

Automating the Conversion of Dynamic SQL to Static Queries

Step #2 can be automated to eliminate manual labor as the dynamic SQL replacement needs to be re-applied every time the code is scanned from the database. Here is a suggestion as to how to automate it.

Preparation Phase

  1. Create a procedure like LogStaticStatement(statement, statement_identification) to log the generated queries in a file or database table.

  2. Tag the dynamic code in all the scripts and log the generated statement as shown below.

    FUNCTION process_job(in_job_name VARCHAR2)
    RETURN VARCHAR2
    IS
    DECLARE
       vSrcSchema VARCHAR2;
       vTgtSchema VARCHAR2;
       vTblName VARCHAR2;
       vCondition VARCHAR2;
    
    BEGIN
       -- read configuration of the job from the configuration table
       SELECT source_schema, target_schema, table_name, condition INTO vSrcSchema, vTgtSchema, vTblName, vCondition FROM JobConfiguration WHERE status = 'enabled' and job_name = in_job_name;
       -- (dynamically) generate a SQL statement to creta a copy of the table
       -- ## Begin Dynamic Statement ##: MyJobConfiguration1
       v_sql = 'INSERT INTO ' || vTgtSchema || '.' || vTblName || ' SELECT * FROM ' || vSrcSchema || '.' || vTblName || ' WHERE ' || vCondition;
       -- run the generated SQL
       EXECUTE IMMEDIATE v_sql;
       LogStaticStatement(v_sql, 'MyJobConfiguration1');
       -- ## End Dynamic Statement ##
       ...
    END
    
  3. Create a pre-processing script (can be a simple Linux shell script) to replace the tagged dynamic code with the static version from the log. That is, replace

       -- ## Begin Dynamic Statement ##: MyJobConfiguration1
       EXECUTE IMMEDIATE v_sql;
       LogStaticStatement(v_sql, 'MyJobConfiguration1');
       -- ## End Dynamic Statement ##
    

    with the actual logged statement that can be retrieved using the MyJobConfiguration1 tag.

Lineage Scanning Phase

The approach here is the same as in approach #4, with step #2 being automated.

  1. Run the Automatic Data Lineage extraction script.

  2. Run the pre-processing for each script that uses dynamic SQL downloaded by Automatic Data Lineage into the mantaflow/cli/temp/ directory.

  3. Run the Automatic Data Lineage analysis script.

Please note that the simple approach described above may require further detailing, especially when dynamic queries are used in loops.

Other Considerations

There may be more complex situations involving something such as loops, requiring additional logic. Here is an example of an extension of the code above to process all jobs, instead of just one, that process individual records in a loop.

FUNCTION process_all_jobs()
RETURN VARCHAR2
IS
DECLARE
TYPE t_JobConfiguration_refcur    IS REF CURSOR;
TYPE t_JobConfiguration_refcurrec IS RECORD
(
   vSrcSchema JobConfiguration.source_schema%TYPE;
   vTgtSchema JobConfiguration.target_schema%TYPE;
   vTblName JobConfiguration.table_name%TYPE,
   vCondition JobConfiguration.condition%TYPE
)
c_JobConfiguration t_JobConfiguration_refcur;
r_JobConfiguration t_JobConfiguration_refcurrec;

BEGIN
   -- read configuration of the job from the configuration table
   OPEN c_JobConfiguration FOR 'SELECT source_schema, target_schema, table_name, condition INTO vSrcSchema, vTgtSchema, vTblName, vCondition FROM JobConfiguration WHERE status = ''enabled''';

   -- process all the loaded records one by one
   LOOP
      FETCH c_JobConfiguration INTO r_JobConfiguration;
      EXIT WHEN c_JobConfiguration%NOTFOUND;
      -- (dynamically) generate a SQL statement to creta a copy of the table
      v_sql = 'INSERT INTO ' || r_JobConfiguration.vTgtSchema || '.' || r_JobConfiguration.vTblName || ' SELECT * FROM ' || r_JobConfiguration.vSrcSchema || '.' || r_JobConfiguration.vTblName || ' WHERE ' || r_JobConfiguration.vCondition;
      -- run the generated SQL
      EXECUTE IMMEDIATE v_sql;
   END LOOP
END

This will generate several tables for which lineage should be displayed.