SSIS Scanner Guide
IBM Automatic Data Lineage is a powerful data lineage platform that simplifies data management by supporting automated lineage harvesting of SQL Server Integration Services (SSIS) projects/packages. This guide provides steps to set up SSIS lineage harvesting.
Step 1: Configure the Connection
Create a new connection in Admin UI http://localhost:8281/manta-admin-gui/app/index.html?#/platform/connections/
to enable automated extraction and lineage analysis of SSIS projects/packages by Automatic Data Lineage. You will need
one SSIS connection in Automatic Data Lineage for each SSIS server that you want to connect to. The connection requirements and privileges can be found in
SSIS Integration Requirements.
Mandatory Properties to Configure
-
Connection Name: To ensure proper identification across all technologies, the connection name should be unique. This is a user-defined parameter for identifying the SSIS connection only within Admin UI. This value will be displayed in Admin UI under the list of all SSIS connections.
-
SSIS Instance Name (
ssis.extractor.server
) -
To enable automated extraction of the packages and/or projects from the SSIS Catalog, configure the following.
-
Connection String (
ssis.extractor.url
), deployment method (ssis.extractor.enableProjects
,ssis.extractor.enablePackages
) -
Authentication Type: See MS SQL Server Authentication Types for more details on individual options.
-
Optional Properties
-
To control the scope of the extraction, use
ssis.extractor.include
,ssis.extractor.exclude
,ssis.dataflow.include
, andssis.dataflow.exclude
to restrict the extraction on the folder or even the dataflow level. -
Select the SSIS environment that should be used to pull variable values for lineage by specifying it in
ssis.extractor.environment.name
.
See the SSIS Resource Configuration page for the full list and a detailed explanation of the properties that can be configured for the scanner.
Step 1.1: Disabling SSIS Automated Extraction for Manually Provided Packages/Projects
It is important to note that both projects and packages can be stored in various locations, including the file system or version control system, as well as the SSIS catalog in the SQL Server database. In instances where projects/packages are not stored on the SQL Server database, Automatic Data Lineage requires manual placement of the .ispac/.dtsx files on the Automatic Data Lineage server in the appropriate directory for ingestion and analysis. In such cases, it is necessary to disable the SSIS extraction and place the package/project files in the proper location.
-
Create a new SSIS connection and only configure the SSIS Connection Name and SSIS Instance Name. (Manta Admin UI may require that the Connection String and Username fields also be be configured. In this situation, just fill in placeholder values for the required fields as they will not be leveraged during scan time.)
-
Disable the
ssisExtractorScenario
by creating a custom workflow that skips the execution of the SSISExtractor Scenario
. If using Manta Orchestration API scripts to call specific scenarios, do not invoke thessisExtractorSecnario
. -
Create the directory
manta/cli/temp/ssis/<SSIS_instance_name>/
and place all the.dtsx
files there (if in package deployment mode) and/or.ispac
files (if in project deployment mode).
It is expected that the content provided will be the same as the files produced by automated extraction. This means that any additional files are ignored, especially configurations for package deployment models (
.dtsConfig
files) that are not supported. (If necessary, it should be enough to use the parameter value override described later to set the correct values.)
Optional Additional Steps
Step 2: Overriding SSIS Variables and Parameter Values (Optional)
During the runtime of SSIS jobs, variables and parameters are read from the database. However, Automatic Data Lineage lineage analysis may encounter limitations as it doesn't have access to these runtime values and uses the default values defined in the jobs instead. This can lead to incomplete or unexpected lineage results. To address this, define manual override values for the variables and parameters to be used specifically during lineage analysis. This override becomes mandatory when there is no default value available for Automatic Data Lineage to utilize.
User-defined overrides are provided as a semicolon (;
) delimited CSV file in the path specified in the ssis.variableOverrides
property. (See SSIS Resource Configuration.) If
you need to use a semicolon or a newline in a variable value, enclose the value in quotation marks ("
); escape any quotation marks using a backslash (\
). The first line of the file is considered a header and is ignored.
The individual columns in the file are:
Column |
Description |
Examples / allowed values |
---|---|---|
Scope mask |
Regular expression to determine the scope of the variable to apply this value to The expression is matched against the full hierarchical path to the variable's parent object (starting with the folder and separated by “
For the path
|
folder1/Project1.ispac/Package1 Package2/SqlTask1 |
Qualified name |
Qualified name of the variable or parameter |
User::var1 |
Mode |
DEFAULT: Initialize the variable to this value on package/project load OVERRIDE: Use the configured value whenever the variable's value is read, regardless of any assignments to this value, property expressions, etc. |
DEFAULT |
Data type |
Data type to convert the value to Use STRING for dates If NULL is selected, the value column is ignored |
STRING |
Value |
The value to use |
Hello |
Example of a variable override file:
"Scope mask";"Qualified Name";"Mode (DEFAULT|OVERRIDE)";"Data Type";"Value"
DWH_ETL_Folder/SSIS_Project;$Project::Database;OVERRIDE;STRING;"DB_123"
DWH_Prod_Folder/SSIS_Project/DTSX_Package;$Package::StagingDatabase;OVERRIDE;STRING;"MANTA_STAGING"
Step 3: Overriding Component Data Lineage (Optional)
Within Automatic Data Lineage, you have the capability to specify or override the lineage between inputs and outputs of components. This functionality proves useful in scenarios where the component's lineage analysis is not supported or may yield incomplete results. For instance, unsupported components may exhibit default behavior where inputs are connected to outputs based on matching names or through a generic connection mechanism, potentially leading to inaccurate lineage. Additionally, programmable components with unanalyzed code can benefit from overriding the default lineage to achieve more precise results.
Be aware that this feature is not designed to address unsupported sources and targets. It primarily focuses on overriding lineage within a component, specifically between its input and output ports. Therefore, if you encounter issues connecting your SSIS package to the underlying database, this feature is not applicable as it does not handle the broader connectivity aspect.
If a manual lineage definition is present for the given component, Automatic Data Lineage will not analyze the component at all and will instead use the lineage as defined manually. You cannot define anything but the component's input/output column for the manual definition source/target (i.e., you cannot use it to connect an input column to a database column, etc.). If there are multiple data flows to be defined through a single component, put each source-target mapping on a separate line with the same component path.
The configuration file must be put on the path specified in the ssis.componentOverrides
property. (See
SSIS Resource Configuration.) The format is CSV with values separated by semicolons (;
). If you need to use a semicolon or a newline in any field, enclose the value in quotation marks
("
); escape any quotation marks using a backslash (
\
). The first line of the file is considered to be a header and is ignored. The individual columns in the file are:
Column |
Description |
Examples / allowed values |
---|---|---|
Full path to component |
The full hierarchical path to the component (starting with any folder(s), project file name, or package name and separated by " For package deployment, use the package file name for the project path segment |
folder1/Project1.ispac/Package1/Data Flow Task/Script Component folder1/Package1.dtsx/Package1/Data Flow Task/Script Component |
Source column |
Name of the input column or its
If the component has multiple inputs, qualify any ambiguous column names with an input name |
LastName Package1\Data Flow Task\Script Component.Inputs[Input 0].Columns[LastName] Input 0.LastName |
Target column |
Name of the output column or its
If the component has multiple outputs, qualify any ambiguous column names with an output name |
LastName Package1\Data Flow Task\Script Component.Outputs[Output 0].Columns[LastName] Output 0.LastName |
Edge type |
Type of edge to use (direct or filter) If not provided, direct edge is used |
DIRECT |
Description (optional) |
A description of the transformation Note that the description is added to the target column; if multiple manual definitions use the same target column and provide a description, all descriptions will be merged and used for the target column |
Currency validation and conversion |
Example of a component override file:
"Full path to component";"Source column";"Target column";"Edge Type";"Description"
folder1/Project1.ispac/Package1/Data Flow Task/Script Component;Package1\Data Flow Task\Script Component.Inputs[Input 0].Columns[LastName];Package1\Data Flow Task\Script Component.Outputs[Output 0].Columns[LastName];DIRECT;
folder1/Package1.dtsx/Package1/Data Flow Task/Script Component;Input 0.LastName;Output 0.LastName;DIRECT;"Edge for last name"
Step 4: Parameter Values Configured on the Server (Optional)
For a project deployment model, the parameter values can be configured on the SSIS server after the project is deployed using a literal value or variable value sourced from the configured SSIS environment folder|name (https://learn.microsoft.com/en-us/answers/questions/256900/ssis-use-environment-configuration-in-visual-studi ). These values are extracted next to the project file in a file with the same name as the project but with the .properties
suffix. For example, extraction of the project MyProject
will produce a MyProject.ispac
file with a project definition and a MyProject.properties
file with parameter values. These values will be used during
data flow generation to override the default values of the project or package parameters.
There are some limitations on the extraction of these values.
-
Parameters marked as sensitive cannot be extracted. Their values are encrypted and inaccessible.
-
Parameter values can only be overridden for a single instance of a package execution. This value is not stored on the server.
It is possible to manually edit the properties file with parameter values after the extraction and to set the value for the parameters that cannot be extracted. Caution: This change will be lost after the next automatic extraction overrides the
file. The format of the file is
parameterName=parameterValue
, where the parameterName
for the project parameters only consists of the parameter name. For the package parameter, the parameterName
is composed of the package name (including
the .dtsx
suffix) and the parameter name separated by a dot.
Example:
projectParam=value1
MyPackage.dtsx.param1=value2
This is a configured project parameter called projectParam
with a value of value1
and a package parameter of param1
in the package
MyPackage.dtsx
with a value of value2
.