IBM Support

IBM Manta Data Lineage

News


Abstract

IBM Manta Data Lineage is sold in Resource Units (RU).

This article describes the details about Resource Units and their composition (Data Source Definition and Tables), and how to count them.

Content

Definitions

Resource Unit
It is an independent measure of a resource managed by, processed by, or related to the use of the IBM Manta Data Lineage. The resources measured for this purpose are Data Source Definitions and Tables, where 1 RU = 1 Data Source Definition or 1 RU = 5000 Tables.
Data source definition
•    Data Source Definition (DSD): A DSD uniquely identifies a specific system, database, or application within the customer's environment that the customer wishes to scan. 
•    One (1) Resource Unit (RU) is consumed for each Data Source Definition (DSD). If multiple scanners are used with one DSD, then each DSD and scanner combination counts as one (1) RU. When multiple DSDs of the same technology type are created only one DSD of that technology type is counted toward the total RU consumption. For clarity, a technology type refers to a specific kind of system, database, or application technology. For example, the technology PostgreSQL includes technology types such as PostgreSQL, Amazon RDS for PostgreSQL, and IBM Cloud Databases for PostgreSQL. Similarly, the technology IBM Db2 includes technology types like IBM Db2 and IBM Db2 for z/OS.

Tables
•    A table refers to an individual data table within the customer's source system that IBM Manta Data Lineage scans. 
•    Views, synonyms, and tables that IBM Manta Data Lineage deduces are not included in this total. The number of tables will be rounded up to the next 5000 tables when the Resource Units calculation is required.
•    One (1) Resource Unit (RU) is consumed for every Five Thousand (5000) tables processed by the system.

 
Basic Information

A Licensed Configuration specifies usage parameters for a single IBM Manta Instance.

The definitions below are intended for the purpose of estimating the number of RUs in each particular technology.  IBM Manta made a substantial effort to make these definitions as accurate as possible, however, it is not feasible to make them conform exactly to the Product implementation.

The queries provided below give a total count; customers can choose what should be scanned, e.g. exclude system databases/schemas/folders/projects/packages that do not contain any interesting information so that they do not count against the licensed RU limit.

Currently Supported Technologies
The Program will scan lineage for the following technologies:
Integration tools

Source

How to count Data Source Definition (DSD)

How to count Tables

Datastage for CP4D/ Standalone

one DSD

N/A

SSIS

one DSD

N/A

IFPC

one DSD

N/A

Kafka

one DSD

N/A

Matillion

one DSD

N/A

Oracle Data Integrator (ODI)

one DSD

N/A

Power Designer

one DSD

N/A

Streamsets

one DSD

N/A

Talend

one DSD

N/A

Business Apps

Source

How to count Data Source Definition (DSD)

How to count Tables

Power BI - Azure

one DSD

N/A

MicroStrategy

one DSD 

N/A

Tableau Cloud

one DSD

N/A

Tableau

One DSD

N/A

Cognos

one DSD 

N/A

Excel

one DSD

N/A

OBIEE

one DSD

N/A

Power BI - On prem

one DSD

N/A

Qlik Sense

one DSD

N/A

SAP BO

one DSD

N/A

SAS

one DSD

N/A

SSAS

one DSD

N/A

SSRS

one DSD

N/A

Others

Source

How to count Data Source Definition (DSD)

How to count Tables

OpenLineage

one DSD

Specific to technology whose lineage is provided via the OpenLineage scanner

ER/Studio

one DSD

N/A

Erwin one DSD N/A
Power Designer one DSD N/A

Cobol

one DSD

N/A

Amazon S3

one DSD

N/A

Data Stores

Source

How to count Data Source Definition (DSD)

How to count Tables

Snowflake

one DSD

For every database

WITH assets AS ( SELECT table_catalog AS database_name, table_schema AS schema_name, table_name AS asset_name, 'relational table' AS asset_type FROM <database_name>.information_schema.TABLES UNION SELECT table_catalog AS database_name, table_schema AS schema_name, table_name AS asset_name, 'external table' AS asset_type FROM <database_name>.information_schema.external_tables ) SELECT schema_name,      asset_type, count(*) AS asset_count FROM assets GROUP BY schema_name, asset_type;

Azure SQL Server Db 

one DSD

For every database:
select count(*) from <database_name>.sys.objects where type in ('U', 'ET');
Or for every schema that will be scanned for lineage:
select count(*) from <database_name>.sys.objects where schema_id = (select schema_id from <database_name>.sys.schemas where name = '<schema_name>') and type in ('U', 'ET');

Azure SQL Server Db Managed Services one DSD For every database:
select count(*) from <database_name>.sys.objects where type in ('U', 'ET');
Or for every schema that will be scanned for lineage:
select count(*) from <database_name>.sys.objects where schema_id = (select schema_id from <database_name>.sys.schemas where name = '<schema_name>') and type in ('U', 'ET');

Amazon RDS for MSSQL
 

one DSD

For every database:
select count(*) from <database_name>.sys.objects where type in ('U', 'ET');
Or for every schema that will be scanned for lineage:
select count(*) from <database_name>.sys.objects where schema_id = (select schema_id from <database_name>.sys.schemas where name = '<schema_name>') and type in ('U', 'ET');

Microsoft SQL Database one DSD For every database:
select count(*) from <database_name>.sys.objects where type in ('U', 'ET');
Or for every schema that will be scanned for lineage:
select count(*) from <database_name>.sys.objects where schema_id = (select schema_id from <database_name>.sys.schemas where name = '<schema_name>') and type in ('U', 'ET');

Amazon RDS for PostgreSQL

one DSD

For every database:
select pn.nspname as schema_name, case when pc.relkind = 'r' then 'relational table' when pc.relkind = 't' then 'TOAST table' else 'unknown' end as asset_type, count(*) as asset_count from pg_class pc, pg_namespace pn where pc.relnamespace = pn.oid and pc.relkind = 'r' and pn.nspname NOT LIKE 'pg_temp_%' and pn.nspname NOT LIKE 'pg_toast_temp_%' and pn.nspname NOT IN ('pg_toast', 'pg_internal', 'pg_catalog', 'information_schema', 'pg_bitmapindex', 'pg_aoseg', 'gp_toolkit') group by schema_name, asset_type;

PostgreSQL one DSD For every database:
select pn.nspname as schema_name, case when pc.relkind = 'r' then 'relational table' when pc.relkind = 't' then 'TOAST table' else 'unknown' end as asset_type, count(*) as asset_count from pg_class pc, pg_namespace pn where pc.relnamespace = pn.oid and pc.relkind = 'r' and pn.nspname NOT LIKE 'pg_temp_%' and pn.nspname NOT LIKE 'pg_toast_temp_%' and pn.nspname NOT IN ('pg_toast', 'pg_internal', 'pg_catalog', 'information_schema', 'pg_bitmapindex', 'pg_aoseg', 'gp_toolkit') group by schema_name, asset_type;

Redshift

one DSD

For a every database:
with assets as (     select     pn.nspname as schema_name, pc.relname as asset_name, case when pc.relkind = 'r' then 'relational table' when pc.relkind = 't' then 'TOAST table' else 'unknown' end as asset_type    from       pg_class_info pc, pg_namespace pn         where     pc.relnamespace = pn.oid and pc.relkind = 'r' and pn.nspname NOT LIKE 'pg_temp_%' and pn.nspname NOT LIKE 'pg_toast_temp_%' and pn.nspname NOT IN ('pg_toast', 'pg_internal', 'pg_catalog', 'information_schema', 'pg_bitmapindex', 'pg_aoseg', 'gp_toolkit') union select et.schemaname as schema_name, et.tablename as asset_name, 'external table' as asset_type    from svv_external_tables et        where et.redshift_database_name = current_database() ) select schema_name, asset_type, count(*) from assets group by schema_name, asset_type;

Greenplum

one DSD

For every database:

select pn.nspname as schema_name, case when pc.relkind = 'r' then 'relational table' when pc.relkind = 't' then 'TOAST table' else 'unknown' end as asset_type, count(*) as asset_count from pg_class pc, pg_namespace pn where pc.relnamespace = pn.oid and pc.relkind = 'r' and pn.nspname NOT LIKE 'pg_temp_%' and pn.nspname NOT LIKE 'pg_toast_temp_%' and pn.nspname NOT IN ('pg_toast', 'pg_internal', 'pg_catalog', 'information_schema', 'pg_bitmapindex', 'pg_aoseg', 'gp_toolkit') group by schema_name, asset_type;

DB2

one DSD

For every database:

WITH assets AS ( SELECT tab.tabschema AS asset_schema, tab.tabname AS asset_name, CASE WHEN TYPE IN ('T', 'U') THEN 'relational table' WHEN TYPE = 'L' THEN 'detached table' WHEN TYPE = 'G' THEN 'temporary table' END AS asset_type FROM syscat.tables tab WHERE tab.TYPE IN ('G', 'L', 'T', 'U') AND tab.OWNERTYPE <> 'S' ) SELECT asset_schema,     asset_type, count(*) AS asset_count FROM assets GROUP BY asset_schema, asset_type;

DB2 LUW one DSD

For every database:

WITH assets AS ( SELECT tab.tabschema AS asset_schema, tab.tabname AS asset_name, CASE WHEN TYPE IN ('T', 'U') THEN 'relational table' WHEN TYPE = 'L' THEN 'detached table' WHEN TYPE = 'G' THEN 'temporary table' END AS asset_type FROM syscat.tables tab WHERE tab.TYPE IN ('G', 'L', 'T', 'U') AND tab.OWNERTYPE <> 'S' ) SELECT asset_schema,     asset_type, count(*) AS asset_count FROM assets GROUP BY asset_schema, asset_type;

DB2 z/OS

one DSD

For every database:

WITH assets AS ( SELECT tab.creator AS asset_schema, CASE WHEN tab.TYPE = 'T' THEN 'relational table' WHEN tab.TYPE = 'G' THEN 'temporary table'  END AS asset_type, tab.name AS asset_name FROM SYSIBM.systables tab WHERE TYPE IN ('G', 'T') ) SELECT asset_schema, asset_type, count(*) AS asset_count FROM assets GROUP BY asset_schema, asset_type;

Oracle

one DSD

For every schema that will be scanned for lineage:

select count(*) from (select T.OWNER as OBJ_SCHEMA, T.TABLE_NAME from ALL_TABLES T where (T.OWNER, T.TABLE_NAME) not in (select OWNER, MVIEW_NAME from ALL_MVIEWS union all select LOG_OWNER, LOG_TABLE from ALL_MVIEW_LOGS)) where TABLE_NAME not like 'BIN$%' AND OBJ_SCHEMA = '<schema_name>';

Amazon RDS for Oracle one DSD For every schema that will be scanned for lineage:

select count(*) from (select T.OWNER as OBJ_SCHEMA, T.TABLE_NAME from ALL_TABLES T where (T.OWNER, T.TABLE_NAME) not in (select OWNER, MVIEW_NAME from ALL_MVIEWS union all select LOG_OWNER, LOG_TABLE from ALL_MVIEW_LOGS)) where TABLE_NAME not like 'BIN$%' AND OBJ_SCHEMA = '<schema_name>';

BigQuery

one DSD

For each extracted project

SELECT table_catalog as database_name, table_schema as schema_name, table_type as asset_type, count(*) as asset_count

FROM                <project_name>.INFORMATION_SCHEMA.TABLES

WHERE table_type in ('BASE TABLE', 'EXTERNAL',                          'MATERIALIZED VIEW')

GROUP BY database_name, schema_name, asset_type;

Aurora PostgreSQL

one DSD

Same as Postgres or MySQL, depending on flavor chosen

Azure Synapse Analytics

one DSD

For each scanned catalog:

SELECT i.TABLE_CATALOG, i.TABLE_SCHEMA, count(*) as TABLE_COUNT

FROM INFORMATION_SCHEMA.TABLES i

WHERE  i.TABLE_CATALOG = ‘<catalog_name>’

GROUP BY i.TABLE_CATALOG, i.TABLE_SCHEMA;

Or, for each scanned schema in a catalog:

SELECT i.TABLE_CATALOG, i.TABLE_SCHEMA, count(*) as TABLE_COUNT

FROM INFORMATION_SCHEMA.TABLES i

WHERE i.TABLE_CATALOG = ‘<catalog_name>’ AND     i.TABLE_SCHEMA = ‘<schema_name>’

GROUP BY i.TABLE_CATALOG, i.TABLE_SCHEMA;

Hive

one DSD

For every schema that will be scanned for lineage:

# rows returned by `SHOW TABLES IN ‘<schema_name>’ - #rows returned by `SHOW VIEWS IN ‘<schema_name>’

MariaDB

one DSD

For each extracted schema

SELECT t.TABLE_SCHEMAcount(*) as ASSET_COUNT

FROM INFORMATION_SCHEMA.TABLES t

WHERE t.TABLE_SCHEMA = '<schema_name>' AND t.TABLE_TYPE = 'BASE TABLE'

GROUP BY t.TABLE_SCHEMA;

My SQL

one DSD

For each extracted schema

SELECT t.TABLE_SCHEMAcount(*) as ASSET_COUNT

FROM INFORMATION_SCHEMA.TABLES t

WHERE t.TABLE_SCHEMA = 'tm_manta_automated_test_1' AND t.TABLE_TYPE = 'BASE TABLE' GROUP BY t.TABLE_SCHEMA;

Netezza

one DSD

For each extracted database:

SELECT database_name, schema_name, asset_type,                 count(*) AS asset_count

FROM (SELECT vt.DATABASE AS database_name,                        vt.SCHEMA AS schema_name, vt.OBJTYPE AS asset_type,              vt.TABLENAME AS asset_name

FROM  _V_TABLE vt UNION SELECT                                 vv.DATABASE AS database_name, vv.SCHEMA AS schema_name, vv.OBJTYPE AS asset_type,                     vv.VIEWNAME AS asset_name

FROM _V_VIEW vv ) a WHERE UPPER(a.asset_type) IN ('TABLE', 'MATERIALIZED VIEW') AND a.database_name =  '<database_name>'

GROUP BY a.DATABASE_NAME, a.SCHEMA_NAME,                a.ASSET_TYPE;

SAP ASE

one DSD

For each extracted database:

SELECT DB_NAME() as TABLE_CATALOG, su.name as TABLE_SCHEMA, count(*) as ASSET_COUNT

FROM sysobjects so INNER JOIN sysusers su ON su.uid = so.uid

WHERE so.type= 'U'

GROUP BY TABLE_CATALOG, TABLE_SCHEMA;

SAP Hana

one DSD

For each extracted schema

SELECT TABLES.SCHEMA_NAMECOUNT (*) AS ASSET_COUNT

FROM "SYS"."TABLES" AS TABLES

WHERE TABLES.TABLE_TYPE <> 'VIRTUAL' AND                 TABLES.IS_USER_DEFINED_TYPE = 'FALSE' AND                TABLES.SCHEMA_NAME = '<schema_name>'

GROUP BY TABLES.SCHEMA_NAME;               

SingleStore

one DSD

For each extracted schema

SELECT t.TABLE_SCHEMAcount(*) as ASSET_COUNT

FROM INFORMATION_SCHEMA.TABLES t

WHERE t.TABLE_SCHEMA = '<schema_name>' AND

t.TABLE_TYPE = 'BASE TABLE' 

GROUP BY t.TABLE_SCHEMA;

Teradata

one DSD

For each extracted database

SELECT d.DatabaseName, count(*) as AssetCount

FROM DBC.TVM o INNER JOIN DBC.Dbase d ON o.DatabaseId = d.DatabaseId

WHERE d.DatabaseName = '<database_name>' and                 o.TableKind in ('T','O')

GROUP BY d.DatabaseName;

Yellowbrick

one DSD

Same as PostgreSQL

Databricks

one DSD

Several options listed below:

Counting tables for Databricks - all catalogs and schemas available

%python
# Import necessary libraries
from pyspark.sql import SparkSession

# Create a Spark session
spark = SparkSession.builder.getOrCreate()

def get_hive_metastore_tables_count():
    table_count = 0
    try:
        schemas_df = spark.sql("SHOW SCHEMAS IN hive_metastore").select("databaseName")

        for row in schemas_df.collect():
            schema_name = row['databaseName']
            tables_df = spark.sql(f"SHOW TABLES IN hive_metastore.{schema_name}")
            table_count += tables_df.collect().__len__()       
    except AnalysisException as e:
        print("hive metastore not present")
   
    print("Number of hive_catalog tables: " + str(table_count))
    return table_count

def get_unity_catalog_tables_count():
    table_count = 0
    sql_query = """
    SELECT DISTINCT table_name, table_schema, table_catalog
    FROM system.information_schema.tables
    WHERE table_schema != 'information_schema'
    AND table_type != 'VIEW'
  ;"""
    try:
        tables_df = spark.sql(sql_query)
        table_count += tables_df.collect().__len__()
    except AnalysisException as e:
        print("Error while counting UC tables: " + str(e))
    print("Number of UC tables: " + str(table_count))
    return table_count

total_count = 0
total_count += get_hive_metastore_tables_count()
total_count += get_unity_catalog_tables_count()
print("----------------------------")
print("Total number of tables: " + str(total_count))

Counting tables for Databricks – specific catalogs and schemas

%python
# Import necessary libraries
from pyspark.sql import SparkSession

# Create a Spark session
spark = SparkSession.builder.getOrCreate()

def get_hive_metastore_tables_count():
    table_count = 0
    try:
        schemas_df = spark.sql("SHOW SCHEMAS IN hive_metastore").select("databaseName")

        for row in schemas_df.collect():
            schema_name = row['databaseName']
            tables_df = spark.sql(f"SHOW TABLES IN hive_metastore.{schema_name}")
            table_count += tables_df.collect().__len__()       
    except AnalysisException as e:
        print("hive metastore not present")
   
    print("Number of hive_catalog tables: " + str(table_count))
    return table_count

def get_unity_catalog_tables_count():
    table_count = 0
    sql_query = """
    SELECT DISTINCT table_name, table_schema, table_catalog
    FROM system.information_schema.tables
    WHERE table_schema != 'information_schema'
    AND table_type != 'VIEW'
    AND table_catalog = 'CATALOG_NAME'
    AND table_schema = 'SCHEMA_NAME'
  ;"""
    try:
        tables_df = spark.sql(sql_query)
        table_count += tables_df.collect().__len__()
    except AnalysisException as e:
        print("Error while counting UC tables: " + str(e))
    print("Number of UC tables: " + str(table_count))
    return table_count

total_count = 0
total_count += get_hive_metastore_tables_count()
total_count += get_unity_catalog_tables_count()
print("----------------------------")
print("Total number of tables: " + str(total_count))

Counting tables for Databricks – Unity Catalog only

Run the following query in Databrics notebook

%sql
SELECT count(*)
FROM (
  SELECT DISTINCT table_name, table_schema, table_catalog
  FROM system.information_schema.tables
  WHERE table_schema != 'information_schema'
    AND table_type != 'VIEW'
)

[{"Type":"MASTER","Line of Business":{"code":"LOB76","label":"Data Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSTBNP","label":"IBM Manta Data Lineage"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
05 May 2025

UID

ibm17177044