GitHubContribute in GitHub: Edit online

SAS Manual Inputs

To analyze SAS files:

SAS Metadata Collection for Predefined Library Tables

Problem Statement

Permanent SAS datasets are often used in SAS code by defining the libname statement in the code. However, those permanent datasets may have been created in a different SAS code. For example:

create_member.sas

libname stg1 "/opt/sas/data/stage1";
data stg1.member;
member_id= 12345;
member_name= "John Smith";
member_address= "1234, Manta Lane, Tampa, FL, 33634";
member_phone= "999-999-9999";
run;

use_member.sas

libname stg1 "/opt/sas/data/stage1";
data member1;
set stg1.member;
run;

proc sql noprint;
create table member2 as
select member_id, member_name 
from member1;
quit;

In cases where these datasets are used to build subsequent datasets (like in the example for use_member.sas above), Manta does not have the metadata for the source table, so the source table is deduced based on the query used to build the target table. This sometimes leads to incorrect or missing lineage.

No alt text provided

And you will see the log message below in the sasDataflowScenario log file.

2023-01-29 09:43:37.854 [pool-2-thread-2] 2 INFO  eu.profinit.manta.connector.common.resolver.ReferenceResolver.DEDUCTION [Context: \library_table_use.sas] Resolving reference <4,5> AST_QUALIFIED_NAME using deduction to DB_TABLE: ResUnknownObject [currentGuess=Table [name=MEMBER, properties=[DB_TABLE, PROP_RESULTSET], data.id=null, id=2124004797, definition=4,10] (data type: ResUnknownDataType [currentGuess=AbstractDataType [name=MEMBER, properties=[DB_TABLE_TYPE, PROP_RESULTSET], data.id=null, id=1886977149, definition=4,10], properties=[DB_TABLE_TYPE, PROP_UNKNOWN, PROP_RESULTSET, PROP_PROXY], position=4,10]), properties=[DB_TABLE, PROP_UNKNOWN, PROP_RESULTSET, PROP_PROXY], position=4,10]

Notice the part ... using deduction to DB_TABLE: ResUnknownObject [currentGuess=Table [name=MEMBER ....

Solution

To circumvent this issue, provide the table metadata to Manta. As of R40, use SAS scanner to provide SAS scripts to build a “database/dataset” dictionary that will be used while processing actual SAS code with data transformations.

Expected Result:

No alt text provided

Obtaining SAS Dictionary Scripts

Use the SAS script generate_metadata_table.sas below to generate a SAS code with DDL statements for all the tables from the SAS libraries, and include this SAS file along with your other SAS files for analysis using the replace.csv file.

generate_metadata_table.zip

Follow these steps.

  1. Create a libnames.txt file with all the library definitions that the metadata needs to be extracted from. Here is an example libnames.txt file.

    libname dds '/opt/sas/data/dds';
    libname dds1 '/opt/sas/data/dds1';
    libname stg1 '/opt/sas/data/stg1';
    libname stg2 '/opt/sas/data/stg2';
    
  2. Download the SAS code attached above to generate DDL for all the tables from all SAS permanent libraries.

  3. Place libnames.txt in the /tmp directory. If you would like to use a different location, modify line 21 in generate_metadata_table.sas.

    %let libloc= /tmp; /*** path where libnames.txt file will be saved *****/
    

    It will produce all the output lib_meta_"library".sas files in the /tmp location. You can modify the section of the code below if you want to create the files at a different location on line 22.

    %let outfile= /tmp; /*** Path where the output sas code files will be generated ****/
    
  4. The total number of files generated after the SAS code execution should be the same as the number of entries in the libnames.txt file. Here is an example for lib_meta_"library".sas (e.g., lib_meta_stg1.sas).

    libname DDS '/tmp/dds';
    
    /* SAS data step DDL for DDS.A2_ADDRESS_WITH_LABLES; */
    data DDS.A2_ADDRESS_WITH_LABLES; 
      attrib
        ADDRESS_END_DATE                 length=$8. format=22.3 label="Last date of Address validity" 
        ADDRESS_RK                       length=8  format=12.  label="ADDRESS_RK" 
        ADDRESS_START_DATE               length=$8. format=22.3 label="First day of address validity" 
        ADDRESS_STATE_CODE               length=3  format=$3.  label="ADDRESS_STATE_CODE" 
        ADDRESS_STATE_NAME               length=50  format=$50.  label="ADDRESS_STATE_NAME" 
        ADDRESS_STREET                   length=100  format=$100.  label="ADDRESS_STREET" 
        ADDRESS_TOWN                     length=50  format=$50.  label="ADDRESS_TOWN" 
        ADDRESS_TYPE_CODE                length=3  format=$3.  label="Adress Type Code (HOM=Home, WRK=Work)" 
        ADDRESS_TYPE_TEXT                length=50  format=$50.  label="Address Type (Home, Work)" 
        ADDRESS_ZIP                      length=10  format=$10.  label="ZIP code" 
        ARRIDX_POPULATION_REGISTRY_IF23  length=8  format=13.  label="ARRIDX_POPULATION_REGISTRY_IF23" 
        IDS_IF23_ADDRESS                 length=8  format=22.  label="IDS_IF23_ADDRESS" 
        POPULATION_REGISTRY_RK           length=8  format=14.  label="POPULATION_REGISTRY_RK" 
        PROCESSED_DTTM                   length=8  format=20.  label="PROCESSED_DTTM" 
        SOURCE_SYSTEM_CD                 length=3  format=$3.  label="SOURCE_SYSTEM_CD" 
        SYS_CREATEDATE                   length=8  format=9.  label="SYS_CREATEDATE" 
        VALID_FROM_DTTM                  length=8  format=20.  label="VALID_FROM_DTTM" 
        VALID_TO_DTTM                    length=8  format=20.  label="VALID_TO_DTTM" 
      ;
      stop;
    run;
    
    /* SAS data step DDL for DDS.TEST1; */
    data DDS.TEST1; 
      attrib
        col1                             length=8  format=8. 
        col2                             length=3  format=3. 
        col3                             length=4  format=4. 
      ;
      stop;
    run;
    
  5. Copy these files to the Manta server <MANTA_DIR_HOME>/input/sas/${sas.system.id}/${sas.extractor.input} folder.

  6. In the Admin UI for the SAS connection, include the extractor.

    No alt text provided

  7. Execute the workflow.