How to Pre-Process Oracle and SQL*Plus Scripts

This is a guide to using IBM Automatic Data Lineage to scan environments leveraging the SQL*Plus command line tool (executing SQL and PL/SQL code against an Oracle database).

Automatic Data Lineage supports scanning SQL and PL/SQL statements/scripts — but not SQL*Plus or shell or batch scripts with embedded PL/SQL code. Consequently, pre-processing logic is needed to calibrate the scripts for Automatic Data Lineage native scanners. One of the key tenants of Automatic Data Lineage is automation, as we believe it is the most efficient way of managing physical metadata. You can use the guidelines below, as well as a set of pre-processing scripts (i.e., Linux shell scripting), to design an automated approach tailored to your specific environment.

How to Identify SQL*Plus Scripts

SQLPlus scripts can be identified based on the use of specific commands that are not part of PL/SQL but only available in SQL\Plus. You can use the SQL*Plus User’s Guide and Reference as a reference. The most commonly used command is DEFINE, which defines a variable that can be used later on using the syntax &variable_name..

For example:

-- the following two DEFINE commands are SQL*Plus specific syntax
define schema_mid                     = medid_dm
define tblspace_stg_mid               = medid_stg_data

-- use of the variables &schema_mid. and &tblspace_stg_mid. are SQL*Plus specific syntax
create table &schema_mid..intr_4m_member_stg
tablespace &tblspace_stg_mid nologging compress parallel &parallel_high
as 
select  /*+ parallel(&parallel_high)*/  
  mbr.member_id                   
, mbr.unique_member_id        
, mbr.subscriber_id   
from &schema_mid..intr_4m_membership mbr';

Approach to Analyzing Data Lineage

To analyze the lineage of such scripts with Automatic Data Lineage, we recommend doing the following.

Approach to Analyzing Process Lineage

Process lineage is not a concept generally captured by Automatic Data Lineage OOTB scanners, but it can be nicely modeled using metadata import via Open Manta Extensions. The high-level idea is to capture the calls and inclusions of one script to another to capture the dependency. We recommend creating a script that:

  1. Scans for all SQL*Plus script calls (“@“) and converts them into Automatic Data Lineage custom metadata.

  2. Includes any additional required custom metadata attributes in scripts in the form of comments.

  3. Utilizes pre-processing logic that will then scrape out these comments and insert them into the required Automatic Data Lineage custom metadata import (CSV) files that Automatic Data Lineage can readily ingest.

Example

The preprocessing using replace.csv (Manta Flow Scanner (Client) Configuration: Placeholder Replacement in Input Scripts) file for the sample script above might look as follows with processing the patterns as regular expressions set to enabled:

&schema_mid[.],medid_dm
&tblspace_stg_mid[.],medid_stg_data
(?i)^DEFINE,-- DEFINE