SAS Processing %if-%then-%else macros
Manta Data Lineage processes both the %then
and %else
branches of %if
macros. This improves the lineage in general, but there are some edge cases where it may cause parts of lineage to be missed. One of these
edge cases is when a %if
macro is used to construct SQL. Here we describe such edge cases and discuss how to avoid them.
Issue Description
Macros are handled by a preprocessor prior to SAS parsing. The preprocessor basically finds and replaces (which is a very simplified explanation). If a %if
macro is used in the SQL code to construct the SQL statement, then processing
both the %then
and %else
branches may result in invalid SQL. This is because the macro is written in a way that only accommodates the use of one branch during runtime; no consideration is given to what will happen if
both are used. In many cases, the result is processable, but in others it is not. From a lineage perspective, however, it may be necessary to capture both to get a full picture of the data flows.
The following piece of SAS code are invalid:
select x from t where
%IF (&weekday. eq 2) %THEN
%DO;
%put&weekday.;
(t1.AWN_DATE BETWEEN DATE -3 AND DATE -1 )
%END;
%ELSE %DO;
t1.AWN_DATE = DATE -1
%END;
order by CHANGE_DATE;
It is invalid because the output of the macro preprocessor used in SAS scanner will be as follows (with no logical operator connecting the two conditions).
select x from t where
(t1.AWN_DATE BETWEEN DATE -3 AND DATE -1 )
t1.AWN_DATE = DATE -1
order by CHANGE_DATE;
In other cases, however, the resulting SQL may be perfectly valid. It really depends on how the code is done.
proc sql;
select x from t where x = 1
%IF (&weekday. eq 2) %THEN
%DO;
%put&weekday.;
and (t1.AWN_DATE BETWEEN DATE -3 AND DATE -1 )
%END;
%ELSE %DO;
and t1.AWN_DATE = DATE -1
%END;
order by CHANGE_DATE;
Because the and
operator is part of the branch that will connect the clauses, it will work just fine.
select x from t where x=1
and (t1.AWN_DATE BETWEEN DATE -3 AND DATE -1 )
and t1.AWN_DATE = DATE -1
order by CHANGE_DATE;
Solution
To handle such edge cases where the generated SQL is invalid:
-
Review the lineage and/or logs to identify these cases.
-
Use the
replace.csv
method (Placeholder Replacement in Input Scripts) to remove or adjust one of the branches to make it valid for the lineage analysis. Note that it is not possible to simply remove the%then
branch.Replace.csv
allows for the use of multi-line patterns and limits the pattern to a specific input to make it narrowly targeted.
Examples
Here are some examples of how to use replace.csv
so that the %else
branch will produce valid SAS code for the first example above.
replace.csv
— Add an and
operator to the statement so that it is properly connected.
"t1.AWN_DATE = DATE -1","and t1.AWN_DATE = DATE -1"
replace.csv
— Make the condition blank.
"t1.AWN_DATE = DATE -1",""