Preprocessing Teradata shell script wrappers of BTEQ & TPT scripts
Preprocessing shell script wrappers of BTEQ scripts
Issue
IBM Automatic Data Lineage requires pure BTEQ scripts as inputs. If (which is fairly common) BTEQ is wrapped in Linux shell scripts or Windows Batch files, e.g. as follows (the comments about the structure are usually not there):
# some shell script syntax that we want to get rid of.
# Any shell variables need to be resolved if they are used in pure BTEQ code
FIELD=$1
ENDDATE=`date +'%Y%m%d%H%M'`
echo "Started execution: "> /tmp/SP_WF_ANTERIOR.txt
date >> /tmp/SP_WF_ANTERIOR.txt
# start of embedded BTEQ code that we want to capture; there can be anything after <<, not just EOF
bteq <<EOF
.LOGON $FIELD
INSERT INTO T_$FIELD(a, b, c) SELECT d, e, f from T_Source where load_date > '$ENDDATE';
.LOGOFF
.quit;
EOF
# end of BTEQ code continued by some shell script syntax that we again want to get rid of
echo "Fin: ">> /tmp/SP_WF_ANTERIOR.txt
date >> /tmp/SP_WF_ANTERIOR.txt
then
-
Only the BTEQ part needs to be extracted
-
Any shell variables (pattern like
$Xxxxor${Xxxxx}) have to be replaced with some default value so that it passes the parsing. In this case at least$Campo(login username and also used as part of table name in SQL query) and$Fecha(date used in SQL query). -
The resulting pure BTEQ syntax needs to be passed to Teradata scanner into
cli/input/teradata/<teradata.id>/bteqfolder.
There are multiple approaches possible how to implement this. Two solutions proposed below:
Solution using replace.csv
Automatic Data Lineage provide a built-in capability for find&replace Manta Flow Scanner (Client) Configuration for manually provided inputs.
The whole process of BTEQ extraction is achievable using
bteqReplace.csv capability by specifying
"(?s).*bteq <<([^ ]+?)\\v(.*?)\\v\\1\\v.*","$2"
"[$]FIELD","FIELD"
"[$]ENDDATE","ENDDATE"
or (probably easier to read)
"(?sm).*bteq <<(.+?)$(.*?)^\\1$.*","$2"
"[$]FIELD","FIELD"
"[$]ENDDATE","ENDDATE"
and setting teradata.bteq.replace.regex=true in the Teradata connection or globally for the Teradata scanner. There may be more placeholders used in other scripts outside of the attached samples set.
Limitations
Note that the replace.csv approach above expects fairly fixed way how BTEQs are wrapped in shell scripts. It also requires providing values for all the Environment variables/placeholders; if they are created dynamically, this may require quite some
effort. If there are more patterns or are used inconsistently, it must be reflected in the
bteqReplace.csv as well.
-
The input shell scripts then need to go to
cli/input/teradata/<dict.id>/bteq/[default database] -
place the sample content of
bteqReplace.csvabove tocli/input/teradata/<dict.id>/bteqReplace.csv
Solution using dummy execution of the BTEQ scripts
To avoid having to provide values for all the environment variables manually, it is possible to use Linux shell to do the work for by simulating the Shell script/BTEQ execution without actually calling bteq utility by replacing it by /usr/bin/cat command. If we can replace input variables by dummy records (based on assumption that these are used for audit purposes only), then the procedure is following:
-
Replace shebang
#!/bin/bashby#!/bin/bash -ito enable alias expansion -
Add 2 lines after shebang
alias bteq="/usr/bin/cat"export LOG_DIR=. -
Fix any environment variables that have not been defined. For examples (this is coming from a specific script):
-
Replace
export LOGFILE="${LOG_DIR}/MERCURY_DEALER_MASTER_RECORD_TYPE_LOAD.sh.$$.$(date +%Y%m%d).log"byexport LOGFILE="${LOG_DIR}/MERCURY_DEALER_MASTER_RECORD_TYPE_LOAD.sql" -
Based on required date format replace
export P_EXTRACTDATE=$1by exportP_EXTRACTDATE='2001/09/01' -
Remove line with
source ./Activity_Count_Post_Processing_1.sh "$LOGFILE" -
Remove lines calling
CALL ${P_MERCURY_T_DB}.CREATEAUDITEVENT_ext- this is not interesting from lineage perspective
-
-
Run
MERCURY_DEALER_MASTER_RECORD_TYPE_LOAD.shwhich will createMERCURY_DEALER_MASTER_RECORD_TYPE_LOAD.sqlwith pure BTEQ syntax -
Copy
MERCURY_DEALER_MASTER_RECORD_TYPE_LOAD.sqltocli/input/teradata/<dict.id>/bteq
All steps can be fully automated by executing some_preprocess.sh prior to Automatic Data Lineage scan.
Preprocessing Job Variables in TPT
Issue
In TPT scripts, Job variables https://docs.teradata.com/r/nYrvpj~lD1VtAtciDfToww/rlawMc2CvccT_8hUgXpuLg can be used, e.g. @TGT_WRK_DB in
STEP droping_table
(
APPLY
('DROP TABLE ' || @TGT_WRK_DB || '. R_SUB_SOC_WRK;')
TO OPERATOR (DDL_OPERATOR_1);
);
If the job variables are not defined in the TPT script itself but supplied externally during runtime, they need to be preprocessed e.g. using replace.csv functionality of Teradata TPT scanner.