IBM Manta Data Lineage

News

Abstract

IBM Manta Data Lineage is sold in Resource Units (RU).

This article describes the details about Resource Units and their composition (Data Source Definition and Tables), and how to count them.

Content

Definitions

Resource Unit
It is an independent measure of a resource managed by, processed by, or related to the use of the IBM Manta Data Lineage. The resources measured for this purpose are Data Source Definitions and Tables, where 1 RU = 1 Data Source Definition or 1 RU = 5000 Tables.

Data source definition
• Data Source Definition (DSD): A DSD uniquely identifies a specific system, database, or application within the customer's environment that the customer wishes to scan.
• One (1) Resource Unit (RU) is consumed for each Data Source Definition (DSD). If multiple scanners are used with one DSD, then each DSD and scanner combination counts as one (1) RU. When multiple DSDs of the same technology type are created only one DSD of that technology type is counted toward the total RU consumption. For clarity, a technology type refers to a specific kind of system, database, or application technology. For example, the technology PostgreSQL includes technology types such as PostgreSQL, Amazon RDS for PostgreSQL, and IBM Cloud Databases for PostgreSQL. Similarly, the technology IBM Db2 includes technology types like IBM Db2 and IBM Db2 for z/OS.

Tables
•   A table refers to an individual data table within the customer's source system that IBM Manta Data Lineage scans.
•   Views, synonyms, and tables that IBM Manta Data Lineage deduces are not included in this total. The number of tables will be rounded up to the next 5000 tables when the Resource Units calculation is required.
•   One (1) Resource Unit (RU) is consumed for every Five Thousand (5000) tables processed by the system.

Basic Information

A Licensed Configuration specifies usage parameters for a single IBM Manta Instance.

The definitions below are intended for the purpose of estimating the number of RUs in each particular technology. IBM Manta made a substantial effort to make these definitions as accurate as possible, however, it is not feasible to make them conform exactly to the Product implementation.

The queries provided below give a total count; customers can choose what should be scanned, e.g. exclude system databases/schemas/folders/projects/packages that do not contain any interesting information so that they do not count against the licensed RU limit.

Currently Supported Technologies
The Program will scan lineage for the following technologies:

Integration tools

Source	How to count Data Source Definition (DSD)	How to count Tables
Datastage for CP4D/ Standalone	one DSD	N/A
SSIS	one DSD	N/A
IFPC	one DSD	N/A
Kafka	one DSD	N/A
Matillion	one DSD	N/A
Oracle Data Integrator (ODI)	one DSD	N/A
Power Designer	one DSD	N/A
Streamsets	one DSD	N/A
Talend	one DSD	N/A

Business Apps

Source

How to count Data Source Definition (DSD)

How to count Tables

Power BI - Azure	one DSD	N/A
MicroStrategy	one DSD	N/A
Tableau Cloud	one DSD	N/A
Tableau	One DSD	N/A
Cognos	one DSD	N/A
Excel	one DSD	N/A
OBIEE	one DSD	N/A
Power BI - On prem	one DSD	N/A
Qlik Sense	one DSD	N/A
SAP BO	one DSD	N/A
SAS	one DSD	N/A
SSAS	one DSD	N/A
SSRS	one DSD	N/A

Others

Source	How to count Data Source Definition (DSD)	How to count Tables
OpenLineage	one DSD	Specific to technology whose lineage is provided via the OpenLineage scanner
ER/Studio	one DSD	N/A
Erwin	one DSD	N/A
Power Designer	one DSD	N/A
Cobol	one DSD	N/A
Amazon S3	one DSD	N/A

Data Stores

Source	How to count Data Source Definition (DSD)	How to count Tables
Snowflake	one DSD	For every database WITH assets AS ( SELECT table_catalog AS database_name, table_schema AS schema_name, table_name AS asset_name, 'relational table' AS asset_type FROM <database_name>.information_schema.TABLES UNION SELECT table_catalog AS database_name, table_schema AS schema_name, table_name AS asset_name, 'external table' AS asset_type FROM <database_name>.information_schema.external_tables ) SELECT schema_name, asset_type, count(*) AS asset_count FROM assets GROUP BY schema_name, asset_type;
Azure SQL Server Db	one DSD	For every database: select count() from <database_name>.sys.objects where type in ('U', 'ET'); Or for every schema that will be scanned for lineage: select count() from <database_name>.sys.objects where schema_id = (select schema_id from <database_name>.sys.schemas where name = '<schema_name>') and type in ('U', 'ET');
Azure SQL Server Db Managed Services	one DSD	For every database: select count() from <database_name>.sys.objects where type in ('U', 'ET'); Or for every schema that will be scanned for lineage: select count() from <database_name>.sys.objects where schema_id = (select schema_id from <database_name>.sys.schemas where name = '<schema_name>') and type in ('U', 'ET');
Amazon RDS for MSSQL	one DSD	For every database: select count() from <database_name>.sys.objects where type in ('U', 'ET'); Or for every schema that will be scanned for lineage: select count() from <database_name>.sys.objects where schema_id = (select schema_id from <database_name>.sys.schemas where name = '<schema_name>') and type in ('U', 'ET');
Microsoft SQL Database	one DSD	For every database: select count() from <database_name>.sys.objects where type in ('U', 'ET'); Or for every schema that will be scanned for lineage: select count() from <database_name>.sys.objects where schema_id = (select schema_id from <database_name>.sys.schemas where name = '<schema_name>') and type in ('U', 'ET');
Amazon RDS for PostgreSQL	one DSD	For every database: select pn.nspname as schema_name, case when pc.relkind = 'r' then 'relational table' when pc.relkind = 't' then 'TOAST table' else 'unknown' end as asset_type, count(*) as asset_count from pg_class pc, pg_namespace pn where pc.relnamespace = pn.oid and pc.relkind = 'r' and pn.nspname NOT LIKE 'pg_temp_%' and pn.nspname NOT LIKE 'pg_toast_temp_%' and pn.nspname NOT IN ('pg_toast', 'pg_internal', 'pg_catalog', 'information_schema', 'pg_bitmapindex', 'pg_aoseg', 'gp_toolkit') group by schema_name, asset_type;
PostgreSQL	one DSD	For every database: select pn.nspname as schema_name, case when pc.relkind = 'r' then 'relational table' when pc.relkind = 't' then 'TOAST table' else 'unknown' end as asset_type, count(*) as asset_count from pg_class pc, pg_namespace pn where pc.relnamespace = pn.oid and pc.relkind = 'r' and pn.nspname NOT LIKE 'pg_temp_%' and pn.nspname NOT LIKE 'pg_toast_temp_%' and pn.nspname NOT IN ('pg_toast', 'pg_internal', 'pg_catalog', 'information_schema', 'pg_bitmapindex', 'pg_aoseg', 'gp_toolkit') group by schema_name, asset_type;
Redshift	one DSD	For a every database: with assets as ( select pn.nspname as schema_name, pc.relname as asset_name, case when pc.relkind = 'r' then 'relational table' when pc.relkind = 't' then 'TOAST table' else 'unknown' end as asset_type from pg_class_info pc, pg_namespace pn where pc.relnamespace = pn.oid and pc.relkind = 'r' and pn.nspname NOT LIKE 'pg_temp_%' and pn.nspname NOT LIKE 'pg_toast_temp_%' and pn.nspname NOT IN ('pg_toast', 'pg_internal', 'pg_catalog', 'information_schema', 'pg_bitmapindex', 'pg_aoseg', 'gp_toolkit') union select et.schemaname as schema_name, et.tablename as asset_name, 'external table' as asset_type from svv_external_tables et where et.redshift_database_name = current_database() ) select schema_name, asset_type, count(*) from assets group by schema_name, asset_type;
Greenplum	one DSD	For every database: select pn.nspname as schema_name, case when pc.relkind = 'r' then 'relational table' when pc.relkind = 't' then 'TOAST table' else 'unknown' end as asset_type, count(*) as asset_count from pg_class pc, pg_namespace pn where pc.relnamespace = pn.oid and pc.relkind = 'r' and pn.nspname NOT LIKE 'pg_temp_%' and pn.nspname NOT LIKE 'pg_toast_temp_%' and pn.nspname NOT IN ('pg_toast', 'pg_internal', 'pg_catalog', 'information_schema', 'pg_bitmapindex', 'pg_aoseg', 'gp_toolkit') group by schema_name, asset_type;
DB2	one DSD	For every database: WITH assets AS ( SELECT tab.tabschema AS asset_schema, tab.tabname AS asset_name, CASE WHEN TYPE IN ('T', 'U') THEN 'relational table' WHEN TYPE = 'L' THEN 'detached table' WHEN TYPE = 'G' THEN 'temporary table' END AS asset_type FROM syscat.tables tab WHERE tab.TYPE IN ('G', 'L', 'T', 'U') AND tab.OWNERTYPE <> 'S' ) SELECT asset_schema, asset_type, count(*) AS asset_count FROM assets GROUP BY asset_schema, asset_type;
DB2 LUW	one DSD	For every database: WITH assets AS ( SELECT tab.tabschema AS asset_schema, tab.tabname AS asset_name, CASE WHEN TYPE IN ('T', 'U') THEN 'relational table' WHEN TYPE = 'L' THEN 'detached table' WHEN TYPE = 'G' THEN 'temporary table' END AS asset_type FROM syscat.tables tab WHERE tab.TYPE IN ('G', 'L', 'T', 'U') AND tab.OWNERTYPE <> 'S' ) SELECT asset_schema, asset_type, count(*) AS asset_count FROM assets GROUP BY asset_schema, asset_type;
DB2 z/OS	one DSD	For every database: WITH assets AS ( SELECT tab.creator AS asset_schema, CASE WHEN tab.TYPE = 'T' THEN 'relational table' WHEN tab.TYPE = 'G' THEN 'temporary table' END AS asset_type, tab.name AS asset_name FROM SYSIBM.systables tab WHERE TYPE IN ('G', 'T') ) SELECT asset_schema, asset_type, count(*) AS asset_count FROM assets GROUP BY asset_schema, asset_type;
Oracle	one DSD	For every schema that will be scanned for lineage: select count(*) from (select T.OWNER as OBJ_SCHEMA, T.TABLE_NAME from ALL_TABLES T where (T.OWNER, T.TABLE_NAME) not in (select OWNER, MVIEW_NAME from ALL_MVIEWS union all select LOG_OWNER, LOG_TABLE from ALL_MVIEW_LOGS)) where TABLE_NAME not like 'BIN$%' AND OBJ_SCHEMA = '<schema_name>';
Amazon RDS for Oracle	one DSD	For every schema that will be scanned for lineage: select count(*) from (select T.OWNER as OBJ_SCHEMA, T.TABLE_NAME from ALL_TABLES T where (T.OWNER, T.TABLE_NAME) not in (select OWNER, MVIEW_NAME from ALL_MVIEWS union all select LOG_OWNER, LOG_TABLE from ALL_MVIEW_LOGS)) where TABLE_NAME not like 'BIN$%' AND OBJ_SCHEMA = '<schema_name>';
BigQuery	one DSD	For each extracted project SELECT table_catalog as database_name, table_schema as schema_name, table_type as asset_type, count(*) as asset_count FROM <project_name>.INFORMATION_SCHEMA.TABLES WHERE table_type in ('BASE TABLE', 'EXTERNAL', 'MATERIALIZED VIEW') GROUP BY database_name, schema_name, asset_type;
Aurora PostgreSQL	one DSD	Same as Postgres or MySQL, depending on flavor chosen
Azure Synapse Analytics	one DSD	For each scanned catalog: SELECT i.TABLE_CATALOG, i.TABLE_SCHEMA, count() as TABLE_COUNT FROM INFORMATION_SCHEMA.TABLES i WHERE i.TABLE_CATALOG = ‘<catalog_name>’ GROUP BY i.TABLE_CATALOG, i.TABLE_SCHEMA; Or, for each scanned schema in a catalog: SELECT i.TABLE_CATALOG, i.TABLE_SCHEMA, count() as TABLE_COUNT FROM INFORMATION_SCHEMA.TABLES i WHERE i.TABLE_CATALOG = ‘<catalog_name>’ AND i.TABLE_SCHEMA = ‘<schema_name>’ GROUP BY i.TABLE_CATALOG, i.TABLE_SCHEMA;
Hive	one DSD	For every schema that will be scanned for lineage: # rows returned by `SHOW TABLES IN ‘<schema_name>’ - #rows returned by `SHOW VIEWS IN ‘<schema_name>’
MariaDB	one DSD	For each extracted schema SELECT t.TABLE_SCHEMA, count(*) as ASSET_COUNT FROM INFORMATION_SCHEMA.TABLES t WHERE t.TABLE_SCHEMA = '<schema_name>' AND t.TABLE_TYPE = 'BASE TABLE' GROUP BY t.TABLE_SCHEMA;
My SQL	one DSD	For each extracted schema SELECT t.TABLE_SCHEMA, count(*) as ASSET_COUNT FROM INFORMATION_SCHEMA.TABLES t WHERE t.TABLE_SCHEMA = 'tm_manta_automated_test_1' AND t.TABLE_TYPE = 'BASE TABLE' GROUP BY t.TABLE_SCHEMA;
Netezza	one DSD	For each extracted database: SELECT database_name, schema_name, asset_type, count(*) AS asset_count FROM (SELECT vt.DATABASE AS database_name, vt.SCHEMA AS schema_name, vt.OBJTYPE AS asset_type, vt.TABLENAME AS asset_name FROM _V_TABLE vt UNION SELECT vv.DATABASE AS database_name, vv.SCHEMA AS schema_name, vv.OBJTYPE AS asset_type, vv.VIEWNAME AS asset_name FROM _V_VIEW vv ) a WHERE UPPER(a.asset_type) IN ('TABLE', 'MATERIALIZED VIEW') AND a.database_name = '<database_name>' GROUP BY a.DATABASE_NAME, a.SCHEMA_NAME, a.ASSET_TYPE;
SAP ASE	one DSD	For each extracted database: SELECT DB_NAME() as TABLE_CATALOG, su.name as TABLE_SCHEMA, count(*) as ASSET_COUNT FROM sysobjects so INNER JOIN sysusers su ON su.uid = so.uid WHERE so.type= 'U' GROUP BY TABLE_CATALOG, TABLE_SCHEMA;
SAP Hana	one DSD	For each extracted schema SELECT TABLES.SCHEMA_NAME, COUNT (*) AS ASSET_COUNT FROM "SYS"."TABLES" AS TABLES WHERE TABLES.TABLE_TYPE <> 'VIRTUAL' AND TABLES.IS_USER_DEFINED_TYPE = 'FALSE' AND TABLES.SCHEMA_NAME = '<schema_name>' GROUP BY TABLES.SCHEMA_NAME;
SingleStore	one DSD	For each extracted schema SELECT t.TABLE_SCHEMA, count(*) as ASSET_COUNT FROM INFORMATION_SCHEMA.TABLES t WHERE t.TABLE_SCHEMA = '<schema_name>' AND t.TABLE_TYPE = 'BASE TABLE' GROUP BY t.TABLE_SCHEMA;
Teradata	one DSD	For each extracted database SELECT d.DatabaseName, count(*) as AssetCount FROM DBC.TVM o INNER JOIN DBC.Dbase d ON o.DatabaseId = d.DatabaseId WHERE d.DatabaseName = '<database_name>' and o.TableKind in ('T','O') GROUP BY d.DatabaseName;
Yellowbrick	one DSD	Same as PostgreSQL
Databricks	one DSD	Several options listed below: Counting tables for Databricks - all catalogs and schemas available %python # Import necessary libraries from pyspark.sql import SparkSession # Create a Spark session spark = SparkSession.builder.getOrCreate() def get_hive_metastore_tables_count(): table_count = 0 try: schemas_df = spark.sql("SHOW SCHEMAS IN hive_metastore").select("databaseName") for row in schemas_df.collect(): schema_name = row['databaseName'] tables_df = spark.sql(f"SHOW TABLES IN hive_metastore.{schema_name}") table_count += tables_df.collect().__len__() except AnalysisException as e: print("hive metastore not present") print("Number of hive_catalog tables: " + str(table_count)) return table_count def get_unity_catalog_tables_count(): table_count = 0 sql_query = """ SELECT DISTINCT table_name, table_schema, table_catalog FROM system.information_schema.tables WHERE table_schema != 'information_schema' AND table_type != 'VIEW' ;""" try: tables_df = spark.sql(sql_query) table_count += tables_df.collect().__len__() except AnalysisException as e: print("Error while counting UC tables: " + str(e)) print("Number of UC tables: " + str(table_count)) return table_count total_count = 0 total_count += get_hive_metastore_tables_count() total_count += get_unity_catalog_tables_count() print("----------------------------") print("Total number of tables: " + str(total_count)) Counting tables for Databricks – specific catalogs and schemas %python # Import necessary libraries from pyspark.sql import SparkSession # Create a Spark session spark = SparkSession.builder.getOrCreate() def get_hive_metastore_tables_count(): table_count = 0 try: schemas_df = spark.sql("SHOW SCHEMAS IN hive_metastore").select("databaseName") for row in schemas_df.collect(): schema_name = row['databaseName'] tables_df = spark.sql(f"SHOW TABLES IN hive_metastore.{schema_name}") table_count += tables_df.collect().__len__() except AnalysisException as e: print("hive metastore not present") print("Number of hive_catalog tables: " + str(table_count)) return table_count def get_unity_catalog_tables_count(): table_count = 0 sql_query = """ SELECT DISTINCT table_name, table_schema, table_catalog FROM system.information_schema.tables WHERE table_schema != 'information_schema' AND table_type != 'VIEW' AND table_catalog = 'CATALOG_NAME' AND table_schema = 'SCHEMA_NAME' ;""" try: tables_df = spark.sql(sql_query) table_count += tables_df.collect().__len__() except AnalysisException as e: print("Error while counting UC tables: " + str(e)) print("Number of UC tables: " + str(table_count)) return table_count total_count = 0 total_count += get_hive_metastore_tables_count() total_count += get_unity_catalog_tables_count() print("----------------------------") print("Total number of tables: " + str(total_count)) Counting tables for Databricks – Unity Catalog only Run the following query in Databrics notebook %sql SELECT count(*) FROM ( SELECT DISTINCT table_name, table_schema, table_catalog FROM system.information_schema.tables WHERE table_schema != 'information_schema' AND table_type != 'VIEW' )

[{"Type":"MASTER","Line of Business":{"code":"LOB76","label":"Data Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSTBNP","label":"IBM Manta Data Lineage"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Tips

IBM Manta Data Lineage

News

Abstract

Content

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?