Setting Up Database / Data Integration / BI Scanners

Here are the basics on how to configure and fine-tune a database, BI/reporting, or data integration scanner.

Configuration

  1. Create a new connection for the scanner under Admin UI > Connections > + Add Connection.

    No alt text provided

  2. Check the scanner prerequisites using the link at the top of the page.

    No alt text provided

    You will be redirected to the IBM Automatic Data Lineage documentation page.
    Prerequisites typically include the hostname or connection URL, the username and password, and the privileges required for the user so that Automatic Data Lineage can access the metadata. Check with your IT/infrastructure or network team to make sure there is connectivity (an open firewall) between Automatic Data Lineage and the selected system.

  3. Configure the connection details.

    1. Refer to the documentation or tooltips in Admin UI for pattern and sample values that should be provided, especially for hostnames and URLs.

    2. In most of the scanners, you have the option to limit the number of extracted databases/schemas/reports or folders (depending on what applies to the scanner). This is good because it limits the scope in accordance with your licensed script capacity and lineage needs.

    3. Note that if you want to scan multiple sites, you will need to create separate connections for each. For example, if you have three MS SQL servers and two Tableau sites, you will need to create a separate connection for each.

  4. Click Validate to have Automatic Data Lineage check if the connectivity is available and if there are any issues. The most common issues are:

    1. Missing connectivity (firewall or proxy restrictions)
      Typical keywords to look for in the error message are Connection reset by peer or Connection timed out, Cannot create PoolableConnectionFactory.

    2. Incorrect username/password
      Typical keywords to look for in the error message are Login failed or Incorrect username or password, 401 Unauthorized.

    3. Missing privileges
      Typical keywords to look for in the error message are Object does not exist, Cannot find ... , Cannot access ... , Insufficient privileges for ..., 403 Forbidden.

    4. Incorrect hostname/URL
      Typical keywords to look for in the error message are No such host is known, Could not find host ..., Cannot find ... , Cannot access ..., Cannot create PoolableConnectionFactory.

  5. If you are expecting the lineage to connect to other systems, those systems need to be configured as connections as well. This is especially true for Reporting and Data Integration tool scanners connecting to source/target databases. If those are not configured in Automatic Data Lineage as connections yet, create connections for them too as described in the previous steps.

Running the Scan

You are ready to run the scan.

  1. If the scanner (or multiple scanners that have been configured) does not extract metadata from the source system (e.g., Talend, DataStage, Erwin, ER/Studio, or PowerDesigner), or if you want to provide additional input for scanners that support it, prepare the inputs in the input folder or as a zip archive for Process Manager. See Manta Flow Usage: Preparing Scanner Inputs for more details.

  2. If all is successful, go to Admin UI > Process Manage to run the lineage analysis for extraction and analysis.

  3. If any errors occur, review the logs under Admin UI > Log Viewer. In Process Manager, there are links that direct you to the Log Viewer for the particular workflow.

    1. Remediate the issues and then repeat the previous step.

Reviewing and Fine-Tuning the Lineage

Review the resulting lineage in Manta Flow Viewer (by default hosted at http://localhost:8080/manta-dataflow-viewer). There are a few things to watch for that may indicate the need for additional configuration or fine-tuning.

  1. DEFAULT_SERVER, DEFAULT_DATABASE, __dummy__ appears somewhere in the repository tree for database scanners.
    This indicates that:

    1. One of the scanners could not automatically detect the right Automatic Data Lineage connection to the database that should be used. A manual dictionary mapping override is needed to resolve this. See the instructions in Manta Admin UI > Configuration > Scanners > Technology > Manual. See Alias (Manual) Connection Mappings Explained for more details on how manual mappings work.

    2. The database referenced by one of the jobs has not been scanned by Automatic Data Lineage at all. Create a connection for that system.

  2. You can see the objects in italics in the repository tree. These are said to be deduced by Automatic Data Lineage: that is, Automatic Data Lineage could not match them to the existing objects in the repository, so it created placeholders for them. Usually, this has the same root case as item #1 and the resolution is the same as well.

  3. The lineage does not properly connect from the Reporting/DI tool to the source/target database, or the server or database name are not what you would expect. This is usually the same case as above.

  4. The lineage for ETL tools seems to be incomplete or the jobs are not visible in the repository browser at all. Especially with ETL tools, it is common that additional pieces of configuration such as parameter files, overrides for run-time-only parameter values, configuration of ODBC connection strings, and similar need to be provided. See the details in the documentation for the specific scanner and Alias (Manual) Connection Mappings Explained. The best way to identify issues is to review the logs for the respective scanners. Look for error messages and review the suggested solution.