GitHubContribute in GitHub: Edit online

SAS Processing %if-%then-%else macros

Manta Data Lineage processes both the %then and %else branches of %if macros. This improves the lineage in general, but there are some edge cases where it may cause parts of lineage to be missed. One of these edge cases is when a %if macro is used to construct SQL. Here we describe such edge cases and discuss how to avoid them.

Issue Description

Macros are handled by a preprocessor prior to SAS parsing. The preprocessor basically finds and replaces (which is a very simplified explanation). If a %if macro is used in the SQL code to construct the SQL statement, then processing both the %then and %else branches may result in invalid SQL. This is because the macro is written in a way that only accommodates the use of one branch during runtime; no consideration is given to what will happen if both are used. In many cases, the result is processable, but in others it is not. From a lineage perspective, however, it may be necessary to capture both to get a full picture of the data flows.

The following piece of SAS code are invalid:

select x from t where
%IF (&weekday. eq 2) %THEN
%DO;
%put&weekday.;
  (t1.AWN_DATE BETWEEN DATE -3 AND DATE -1 )
%END;
%ELSE %DO;
  t1.AWN_DATE = DATE -1
%END;
order by CHANGE_DATE;

It is invalid because the output of the macro preprocessor used in SAS scanner will be as follows (with no logical operator connecting the two conditions).

select x from t where
  (t1.AWN_DATE BETWEEN DATE -3 AND DATE -1 )
  t1.AWN_DATE = DATE -1
order by CHANGE_DATE;

In other cases, however, the resulting SQL may be perfectly valid. It really depends on how the code is done.

proc sql;
select x from t where x = 1
%IF (&weekday. eq 2) %THEN
%DO;
%put&weekday.;
  and (t1.AWN_DATE BETWEEN DATE -3 AND DATE -1 )
%END;
%ELSE %DO;
  and t1.AWN_DATE = DATE -1
%END;
order by CHANGE_DATE;

Because the and operator is part of the branch that will connect the clauses, it will work just fine.

select x from t where x=1
  and (t1.AWN_DATE BETWEEN DATE -3 AND DATE -1 )
  and t1.AWN_DATE = DATE -1
order by CHANGE_DATE;

Solution

To handle such edge cases where the generated SQL is invalid:

  1. Review the lineage and/or logs to identify these cases.

  2. Use the replace.csv method (Placeholder Replacement in Input Scripts) to remove or adjust one of the branches to make it valid for the lineage analysis. Note that it is not possible to simply remove the %then branch. Replace.csv allows for the use of multi-line patterns and limits the pattern to a specific input to make it narrowly targeted.

Examples

Here are some examples of how to use replace.csv so that the %else branch will produce valid SAS code for the first example above.

replace.csv — Add an and operator to the statement so that it is properly connected.

"t1.AWN_DATE = DATE -1","and t1.AWN_DATE = DATE -1"

replace.csv — Make the condition blank.

"t1.AWN_DATE = DATE -1",""