Basic Log Analysis

Problem

Identification of the most common errors and the steps to resolve them. A more detailed analysis of the text logs produced by IBM Automatic Data Lineage scanners and aggregated statistics of the errors to make them easier to focus on along with suggestions on how to resolve the issue.

Note that this approach only works on versions up to Automatic Data Lineage 32. In Automatic Data Lineage 32, the formatting of the text-based logs changed and a simple grep approach does not work anymore. Use the log viewer available in Manta Admin UI instead.

Solution

All logs from lineage analyses are available in the mantaflow/cli/log/ directory. Use the following Linux shell script to filter the logs to get a basic understanding of the common errors in the logs. Follow the related articles to get more detailed information on error messages for particular technologies. On Windows, you can run it after installing a Linux subsystem or downloading the grep and sed command line utilities for Windows.

  1. Save this script in mantaflow/cli/log/.

    filter_logs.sh

    #! /bin/bash
    # version: 2020-02-11
    
    # Common Errors
    grep -l " ERROR " *.log | grep -vE "Dataflow|Extractor.*properties" > _general_ERROR.txt
    grep -l "OutOfMemoryError" *.log > _general_OutOfMemory.txt
    grep "Caused by: " *.log | sort | uniq -c | sort -nr > _general_exceptions.txt
    grep "Connect to localhost.*refused" *.log | sort | uniq -c | sort -nr > _general_MantaServerDown.txt
    grep "Cannot create a new revision without committing the previous one." *RevisionScenario.log > _general_OpenRevision.txt
    grep "Library .*exists in more versions" *.log | sed "s/.*MantaVersionHelper //g" | sort | uniq > _general_incorrectPatchSet.txt
    
    # ifpc
    grep "Connection with shortcut" ifpcDataflowMasterScenario_*.properties.log | sed "s/.*ConnectionException: //g" | sort | uniq -c | sort -nr > _ifpc_unknownConnectionShortcuts.txt
    grep "Connection variable .* is not blank" ifpcDataflowMasterScenario_*.properties.log | sed "s/.*ExternalAnalyzerImpl//g" | sed "s/in transformation [^ ]* //g" | sort | uniq -c | sort -nr > _ifpc_unresolvedConnectionVariable.txt
    grep "Connection variable .* is not blank" ifpcDataflowMasterScenario_*.properties.log | sed "s/.*ExternalAnalyzerImpl Connection variable //g" | sed "s/is not blank.*//g" | sort | uniq -c | sort -nr > _ifpc_unresolvedConnectionVariable_Grouped.txt
    grep "was replaced to an empty string in" ifpcDataflowMasterScenario_*.properties.log | sed "s/.*IfpcDataflowException: //g" | sort | uniq -c | sort -nr > _ifpc_emptyParameter.txt
    grep "Unable to find parameter file" ifpcDataflowMasterScenario_*.properties.log | sed "s/.*ParameterException: //g" | sed "s/ when searching for parameter .*//g" | sort | uniq -c | sort -nr > _ifpc_missingParameterFile.txt
    grep "Creating deduced database" ifpcDataflowMasterScenario_*.properties.log | sed "s/.*TeradataResolver.DEDUCTION//g" | sed "s/ for name.*//g" | sort | uniq -c | sort -nr > _ifpc_missingTeradataDatabase.txt
    grep -E "There is no mapping for the Connection|Cannot find any matching system for Connection" ifpcDataflowMasterScenario_*.properties.log | sed "s/.*ReaderProvider -* *//g" | sed "s/.*DataflowQueryServiceImpl -* *//g" | sed "s/userName.*//g" |sort | uniq -c | sort -nr > _ifpc_noConnectionMapping.txt
    grep "does not exist in the connection setting file" ifpcDataflowMasterScenario_*.properties.log | sort | uniq -c | sort -nr  > _ifpc_connectionNotFound.txt
    
    # ssas
    grep -E "There is no mapping for the Connection|Cannot find any matching system for Connection" ssasDataflowMasterScenario_*.properties.log | sed "s/.*ReaderProvider -* *//g" | sed "s/.*DataflowQueryServiceImpl -* *//g" | sort | uniq -c | sort -nr > _ssas_noConnectionMapping.txt
    
    # talend
    grep "talend.*Unknown component" log.txt | sed "s/.* Unknown component with typename //g" | sort  | uniq -c | sort -nr > _talend_UnknownComponents.log
    
    # ssis
    grep -E "There is no mapping for the Connection|Cannot find any matching system for Connection" ssisDataflowMasterScenario_*.properties.log | sed "s/.*ReaderProvider -* *//g" | sed "s/.*DataflowQueryServiceImpl -* *//g" | sort | uniq -c | sort -nr > _ssis_noConnectionMapping.txt
    grep "Object not found in dictionary" ssisDataflowMasterScenario_*.properties.log | sed "s/.*DatabaseConnectorImpl -* *Object not found in dictionary: \(.*\)\.[^.]*$/\1/g" | sort | uniq -c | sort -nr > _ssis_ObjectNotFoundInDictionary_Grouped.txt
    grep "Object not found in dictionary" ssisDataflowMasterScenario_*.properties.log | sed "s/.*DatabaseConnectorImpl -* *Object not found in dictionary: \([^ ]*\) *$/\1/g" | sort | uniq -c | sort -nr > _ssis_ObjectNotFoundInDictionary.txt
    grep "Could not find .* connection manager" ssisDataflowMasterScenario_*.properties.log | sed "s/.*Analyzer //g" | sed "s/ referenced in.*//g" | sort | uniq -c | sort -nr > _ssis_CouldNotFindConnectionManager_grouped.txt
    grep "Could not find .* connection manager" ssisDataflowMasterScenario_*.properties.log | sed "s/.*Analyzer //g" | sort | uniq -c | sort -nr > _ssis_CouldNotFindConnectionManager.txt
    grep "No connection string set for" ssisDataflowMasterScenario_*.properties.log | sed "s/.*Impl //g" | sed "s/. Lineage may be inaccurate or missing for.*//g" | sort | uniq -c | sort -nr > _ssis_NoConnectionStringSet_Grouped.txt
    grep "No connection string set for" ssisDataflowMasterScenario_*.properties.log | sed "s/.*Impl //g" | sort | uniq -c | sort -nr > _ssis_NoConnectionStringSet.txt
    
    # netezza
    grep " using deduction to DATABASE" netezzaNzplsqlDataflowMasterScenario_*.properties.log | sed "s/.*to DATABASE//g" | sort | uniq -c | sort -nr > _netezza_DeducedDatabase.txt
    
    # teradata
    grep "Creating deduced database object with normalized name" teradataBteqDataflowMasterScenario_*.properties.log | sed "s/.*Creating deduced database object with normalized name//g" | sed "s/ for name.*//g" | sort | uniq -c | sort -nr > _teradataBteq_DeducedDatabase.txt
    grep "Creating deduced database object with normalized name" teradataDdlDataflowMasterScenario_*.properties.log | sed "s/.*Creating deduced database object with normalized name//g" | sed "s/ for name.*//g" | sort | uniq -c | sort -nr > _teradataDdl_DeducedDatabase.txt
    
    # mssql
    grep "Cannot open database " mssqlExtractorMasterScenario_*.properties.log | sed "s/.*Cannot open database //g" | sed "s/ requested by the login.*//g" | sort | uniq > _mssql_missingPriviledgeToDatabase.txt
    grep "The SELECT permission was denied on the object" mssqlExtractorMasterScenario_*.properties.log | sort | uniq > _mssql_missingPriviledgeSelect.txt
    grep "Could not find body of .* check that the database user has VIEW DEFINITION privilege" mssqlExtractorMasterScenario_*.properties.log > _mssql_missingPriviledgeViewDefinition.txt
    
    # oracle
    grep "have sufficient rights to the object or schema." oracleExtractorMasterScenario_*.properties.log > _oracle_insufficientPrivileges.txt
    grep -E "There is no mapping for the Connection|Cannot find any matching system for Connection" oracleDdlDataflowMasterScenario_*.properties.log | sed "s/.*ReaderProvider -* *//g" | sed "s/.*DataflowQueryServiceImpl -* *//g" | sort | uniq -c | sort -nr > _oracle_noConnectionMapping.txt
    
  2. Run the script and investigate the files generated using the table below. After fixing the issue, rerun the Automatic Data Lineage lineage analysis as well as this log filter.

Output file Issue Resolution
General errors
_general_ERROR.txt Unexpected errors during lineage analysis, often caused by missing a particular step in the configuration
  1. Investigate the reported log file.
  2. Go to IBM Support and attach the completed Manta CLI logs.
_general_OutOfMemory.txt Process ran out of memory while running the lineage analysis Add more memory to Manta CLI as described in Java Heap Space Error on Client and run the lineage analysis process again.
_general_exceptions.txt Unexpected errors during lineage analysis Go to IBM Support and attach the completed Manta CLI logs.
_general_MantaServerDown.txt Connection from Manta CLI to Manta Server failed

Verify that Manta Server is up and running on the host and port specified.

If you have changed the Manta Server host and port, verify that this is reflected in the Manta CLI Common Settings in Manta Configurator.

_general_OpenRevision.txt There is an open revision, and therefore, a new revision cannot be created before the existing one is committed or rolled back.

Follow the instructions in Server is not ready to accept new data" / "Error during fetch committed number. There is an uncommitted revision.".

Especially in the test environment, you can commit the open revision instead of rolling it back. Committing it is typically much faster, but you need to be aware that this particular revision is incomplete.

_general_incorrectPatchSet.txt One library in Manta CLI is installed in multiple versions, which is likely the result of incorrect application of a patch. Remove older versions of the reported files from mantaflow/cli/scenarios/manta-dataflow-cli/lib.
Informatica PowerCenter, see Explanation of Informatica PowerCenter Connector Log Errors for more details
_ifpc_unknownConnectionShortcuts.txt An Informatica PowerCenter workflow uses a connection shortcut that is not found among the connections extracted from the Informatica PowerCenter repository.
  1. Verify that the connection exists in the source PowerCenter repository.
  2. Verify that the user has read privileges for that connection.
  3. Fix the workflow and/or parameter file by providing a valid connection name.

_ifpc_emptyParameter.txt

An Informatica PowerCenter workflow uses a parameter that does not seem to have the value defined in the parameter file. This may cause errors when parsing and resolving SQL queries, when the parameter is used, for example, as the database or schema name. Fix the parameter file by providing the actual sample value to correctly resolve the lineage.
_ifpc_missingParameterFile.txt An Informatica PowerCenter workflow uses a parameter file that could not have been found in the referenced path. Provide the Informatica parameter files on the specified path or specify a different base path for the parameter files in the Informatica PowerCenter Connection via the Parameter Files Directory property.
_ifpc_missingTeradataDatabase.txt An Informatica PowerCenter workflow references a Teradata database that is not being scanned by Automatic Data Lineage. Review the reported databases and add them to the Teradata connection.
_ifpc_noConnectionMapping.txt An Informatica PowerCenter workflow uses a connection that could not have been automatically mapped to one of the database connections configured in Automatic Data Lineage. Identify the particular database technology and configure the manual mapping for that database technology in the Manta Configurator app or configure a new connection.

_ifpc_connectionNotFound.txt

_ifpc_unresolvedConnectionVariable_Grouped.txt

_ifpc_unresolvedConnectionVariable.txt

An Informatica PowerCenter workflow uses a connection shortcut that is not found among the connections extracted from the Informatica PowerCenter repository.

Automatic Data Lineage creates a placeholder marked as "DEDUCED". This often results in duplicate objects and unconnected lineage.

  1. Verify that the connection exists in the source PowerCenter repository.
  2. Verify that the user has read privileges on that connection.
  3. Fix the workflow and/or parameter file by providing a valid connection name.
SSAS
_ssas_noConnectionMapping.txt

An SSAS dataflow task uses a connection that could not have been automatically mapped to one of the database connections configured in Automatic Data Lineage.

Automatic Data Lineage creates a placeholder marked as "DEDUCED". This often results in duplicate objects and unconnected lineage.

Identify the particular database technology and configure the manual mapping for that database technology in the Manta Configurator application or configure a new connection.
SSIS, see Explanation of Microsoft SSIS Connector Log Errors for more details
_ssis_noConnectionMapping.txt

An SSIS dataflow task uses a connection that could not have been automatically mapped to one of the database connections configured in Automatic Data Lineage.

Automatic Data Lineage creates a placeholder marked as "DEDUCED". This often results in duplicate objects and unconnected lineage.

Identify the particular database technology and configure the manual mapping for that database technology in the Manta Configurator application or configure a new connection.

_ssis_ObjectNotFoundInDictionary_Grouped.txt

_ssis_ObjectNotFoundInDictionary.txt

An SSIS dataflow task references an (database) object that could not have been found in the connection.

Automatic Data Lineage creates a placeholder marked as "DEDUCED". This often results in duplicate objects and unconnected lineage.

Verify that the connection used/configured contains the particular objects (databases/schemas/tables). If not, remap to a different connection and/or add the desired database/schema to the list of scanned resources.

_ssis_CouldNotFindConnectionManager_grouped.txt

_ssis_CouldNotFindConnectionManager.txt

The connection manager definition referenced in the dataflow task cannot be found.

Make sure that with:

  1. Package deployment (DTSX files), all package configurations are available to Automatic Data Lineage (DTSCONFIG files)
  2. Project deployment (ISPAC files), a complete ISPAC file is provided instead of individual DTSX files

_ssis_NoConnectionStringSet_Grouped.txt

_ssis_NoConnectionStringSet.txt

The connection string in the SSIS package is configured via parameter. The parameter is either empty or defined in runtime. Override the parameter value as described in SSIS Resource Configuration.
Netezza
_netezza_DeducedDatabase.txt

A Netezza script references a database that is not being scanned by Automatic Data Lineage or a placeholder that is not being correctly mapped.

Automatic Data Lineage creates a placeholder marked as "DEDUCED". This often results in duplicate objects and unconnected lineage.

Review the reported databases and add them to the Netezza connection and/or map the Netezza placeholder described in Netezza Resource Configuration.
Teradata, see Explanation of Teradata Connector Log Errors for more details
_teradataBteq_DeducedDatabase.txt

A BTEQ script references a database that is not being scanned by Automatic Data Lineage or a placeholder that is not being correctly mapped.

Automatic Data Lineage creates a placeholder marked as "DEDUCED". This often results in duplicate objects and unconnected lineage.

Review the reported databases and add them to the Teradata connection and/or map the BTEQ placeholder in Teradata Resource Configuration.
_teradataDdl_DeducedDatabase.txt

A Teradata macro, procedure, view, or function references a database that is not being scanned by Automatic Data Lineage or a placeholder that is not being mapped correctly.

Automatic Data Lineage creates a placeholder marked as "DEDUCED". This often results in duplicate objects and unconnected lineage.

Review the reported databases and add them to the Teradata connection.
MS SQL

_mssql_missingPriviledgeToDatabase.txt Cannot log in to the database extraction Verify the privileges required by Automatic Data Lineage as per MS SQL Server Integration Requirements.
_mssql_missingPriviledgeSelect.txt Missing privileges during extraction, most often due to the table sql_expression_dependencies Verify the privileges required by Manta as per MS SQL Server Integration Requirements.
_mssql_missingPriviledgeViewDefinition.txt Missing View (ANY) Definition during extraction Verify the privileges required by Automatic Data Lineage as per MS SQL Server Integration Requirements.
Oracle, see Explanation of Oracle Connector Log Errors for more details
_oracle_insufficientPrivileges.txt Missing privileges during extraction Verify the privileges required by Automatic Data Lineage as per Oracle Integration Requirements.
_oracle_noConnectionMapping.txt An Oracle script uses a connection (typically a database link) that could not have been automatically mapped to one of the database connections configured in Automatic Data Lineage. Identify the particular Oracle database connection and configure the manual mapping for that connection in the Manta Configurator application or configure a new connection.