Microsoft SQL Server lineage configuration

To import lineage metadata from Microsoft SQL Server, create a connection, data source definition and metadata import job.

This information applies to IBM Manta Data Lineage service.

Microsoft SQL Server is a relational database management system.

Supported Microsoft SQL Server versions

Microsoft SQL Server 2022.

Processed metadata

The following Microsoft SQL Server metadata is processed and displayed in lineage:

Data dictionaries
SQL scripts
T-SQL scripts
Stored procedures
Functions
Triggers

Limitations

The following data is not processed:

Lineage extraction for objects referenced through dynamically executed SQL code by using EXECUTE IMMEDIATE.
Lineage extraction for data movements implemented through INSERT INTO or UPDATE statements that operate on an OPENQUERY.
Lineage extraction for transformations implemented by using OPENXML.
Lineage extraction for external data access performed through OPENROWSET.
Lineage through ODBC scalar functions as described in the Microsoft documentation. Use standard SQL functions instead.
Objects renamed by using the sp_rename procedure. Due to limitations of the sp_rename procedure, the updated object name is not reflected in lineage.
Lineage extraction for encrypted objects, such as Stored procedures or Views as described in the Microsoft documentation.
Authentication using both standalone managed service accounts and group managed service accounts.

Prerequisite configuration

Before you import lineage metadata, ensure that the following prerequisites are met:

You have the following rights:
- CONNECT SQL to connect to the server. For Azure Synapse Analytics, only CONNECT SQL is required.
- VIEW ANY DATABASE to extract the list of existing databases.
- ALTER ANY LINKED SERVER to extract linked server definitions because a read-only permission isn't available for this metadata type.
- In each extracted database:
- VIEW DEFINITION or VIEW ANY DEFINITION on the server level to extract object definitions.
- MSSQL = SQL Server (2008 and newer)
  - SELECT on sys.sql_expression_dependencies to extract object dependencies.
  - SELECT on the following views in INFORMATION_SCHEMA:
    - COLUMNS
    - TABLES
    - ROUTINES
    - VIEWS
  - SELECT on the following tables in sys:
    - synonyms
    - triggers
    - server_triggers
    - server_sql_modules
    - extended_properties
    - servers
    - all_objects
    - tables
    - index_columns
    - indexes
    - foreign_key_columns
    - sequences
    - table_types

Creating a metadata import asset

Data source connection

To connect to the data source from which you want to import lineage metadata, you need to select a data source definition and a connection. You can create them before you start creating the metadata import, or you can create them when you create and configure the metadata import asset.

Data source definition

Select Microsoft SQL Server as the data source type.

Connection

For connection details, see Microsoft SQL Server connection.

Include and exclude lists

You can include or exclude assets up to the schema level. Provide databases and schemas in the format database/schema. Each part is evaluated as a regular expression. Assets which are added later in the data source will also be included or excluded if they match the conditions specified in the lists. Example values:

myDB/: all schemas in myDB database.
myDB2/.*: all schemas in myDB2 database.
myDB3/mySchema1: mySchema1 schema from myDB3 database.
myDB4/mySchema[1-5]: any schema in my myDB4 database with a name that starts with mySchema and ends with a digit between 1 and 5.

External inputs

If you use external SQL and T-SQL scripts for Microsoft SQL Server, you can add them in a .zip file as an external input. You can organize the structure of a .zip file as subfolders that represent databases and schemas. After the scripts are scanned, they are added under respective databases and schemas in the selected catalog or project. The .zip file can have the following structure:

    <database_name>
        <schema_name>
           <script_name.sql>
    <database_name>
        <script_name.sql>
    <script_name.sql>
    replace.csv
    linkedServerConnectionsConfiguration.prm

The replace.csv file contains placeholder replacements for the scripts that are added in the .zip file. For more information about the format, see Placeholder replacements.

The linkedServerConnectionsConfiguration.prm file contains linked server connection definitions. The following structure defines a single connection:

[{Shortcut_Name}] Type={connection_type}
Connection_String={connection_string}
Server_Name={server_name}
Database_Name={database_name}
Schema_Name={schema_name}
User_Name={user_name}

Advanced import options

Extract extended attributes: You can extract extended attributes like primary key, unique and referential integrity constraints of columns. By default these attributes are not extracted.
Extraction mode: You can decide which extraction mode to run for the imported metadata. You have the following options:

Prefetch: use it for relational databases.
Parallel bulk: use it for analytical processing engines.
Single-thread: use it to avoid parallelism and large queries during extraction. When you select this mode, performance might be low.

Performance profile

For selected data sources you can choose a performance profile. Depending on your current needs, the lineage metadata import might be faster or more complete. You can choose between the following profiles:

Fast: Low time and memory consumption are the priorities in this profile. If your input is large, lineage might not be complete.
Balanced: Both performance and lineage completness are important. It is a compromise bewteen the lineage completness and time and memory that is spent on lineage import.
Complete: The completness for lineage is the priority in this profile. If your input is large, the lineage import might take a significant amount of resources and time.
Custom profile: You can create your own performance profile by providing values for the following properties:
- Dataflow Analysis Timeout Limit: Specifies the maximum estimated time (in seconds) after which the dataflow analysis of a single input is stopped. The time is checked when each node is added, or in some cases when edges are created. Therefore, in some cases, the timeout might slightly exceed the specified limit. If you set the value to 0, the analysis is not stopped. Example value: 60.
- Dataflow Analysis Edge Limit: Specifies the maximum number of edges that are allowed for a single input during the dataflow analysis. If this limit is exceeded, all filter edges are removed and no more filter edges are added. If the limit is still exceeded even after that, the analysis is stopped and the input fails. To disable the limit, set the value to 0. Example value: 2500.

Transformation logic extraction

You can enable building transformation logic descriptions from SQL code in SQL scripts.