GET_CACHE_TABLE_INFO table function

The SYSHADOOP.GET_CACHE_TABLE_INFO table function provides a method for retrieving information about tables that are cataloged in the Hive metastore and that might be cached in the scheduler cache.

This information can be useful for the analysis of data fragmentation and metadata load times in scenarios where performance might be impacted by suboptimal data layout. It can also provide insight into the scheduler cache and identify tables whose metadata is currently loaded in memory.

Syntax

Read syntax diagramSkip visual syntax diagram GET_CACHE_TABLE_INFO ( schema-nameregex-patternnull,table-nameregex-patternnull )

Description

schema-name | regex-pattern
Specifies a valid schema name or a regular expression pattern to match against table schemas in the scheduler cache.
table-name | regex-pattern
Specifies a valid table name or a regular expression pattern to match against table names in the scheduler cache.
The function returns one row per database table. Each row contains the following columns:
Table 1. Information returned by the GET_CACHE_TABLE_INFO table function
Column name Data type Description
TABSCHEMA VARCHAR(128) Schema name of the table as it appears in the Db2® catalog
TABNAME VARCHAR(128) Table name as it appears in the Db2 catalog
LOCATION VARCHAR(1000) The location of table data as defined in the Hive metastore. In the case of partitioned tables, data for individual partitions might reside elsewhere, and this location indicates the default directory under which new partitions would be created.
FORMAT VARCHAR(32) File format of the table data
NUM_COLUMNS BIGINT Number of columns (excluding partitioning columns)
PARTITION_COLUMNS BIGINT Number of partitioning columns
NUM_PARTITIONS BIGINT Number of distinct partitions defined for the table
NUM_FILES BIGINT Number of files into which data in the table is split. In the case of partitioned tables, this is an aggregate count for all partitions.
TOTAL_FILE_SIZE BIGINT Aggregate size, in bytes, of all data files that make up the table
METADATA_LOAD_TIME BIGINT Length of time, in milliseconds, taken to load metadata from the file system

Usage

If the arguments identify a fully qualified single table (neither argument is a regular expression or the null value), and the metadata for that table is not available in the cache, the table is loaded from the file system to determine the file size and load time statistics.

If either argument is a regular expression, results are returned only for matching tables that are currently cached. Null input arguments or empty strings are treated as regular expressions that match any schema or table name.

If a table does not exist, or no tables that match the input criteria are found, an empty result set is returned.

This table function does not return any information for regular Db2 tables that are not cataloged in the Hive metastore.

Examples

The following example shows you how to get information about all tables in a specific schema that are currently in the cache:
SELECT
    VARCHAR(TABSCHEMA,12) AS TABSCHEMA,
    VARCHAR(TABNAME,24) AS TABNAME,
    VARCHAR(FORMAT,10) AS FORMAT,
    TOTAL_FILE_SIZE,
    NUM_FILES,
    NUM_COLUMNS,
    PARTITION_COLUMNS,
    NUM_PARTITIONS,
    METADATA_LOAD_TIME
  FROM TABLE(SYSHADOOP.GET_CACHE_TABLE_INFO('my_schema', null));

TABSCHEMA    TABNAME                  FORMAT     TOTAL_FILE_SIZE      NUM_FILES            NUM_COLUMNS          PARTITION_COLUMNS    NUM_PARTITIONS       METADATA_LOAD_TIME  
------------ ------------------------ ---------- -------------------- -------------------- -------------------- -------------------- -------------------- --------------------
MY_SCHEMA    SIMPLE_EXAMPLE           TEXT                        252                    3                    3                    -                    -                   59
MY_SCHEMA    PARQUET_EXAMPLE          PARQUET                     934                    1                    3                    -                    -                   27
MY_SCHEMA    PARTITION_EXAMPLE        ORC                        3394                    8                    4                    1                    3                  101

  3 record(s) selected.
The following example shows you how to get information about a specific table regardless of whether or not it is currently cached:
SELECT
    VARCHAR(TABSCHEMA,12) AS TABSCHEMA,
    VARCHAR(TABNAME,24) AS TABNAME,
    VARCHAR(LOCATION,100) AS LOCATION,
    TOTAL_FILE_SIZE,
    NUM_FILES,
    METADATA_LOAD_TIME
  FROM TABLE(SYSHADOOP.GET_CACHE_TABLE_INFO('other_schema','smallfile_example'));

TABSCHEMA    TABNAME                  LOCATION                                                                                             TOTAL_FILE_SIZE      NUM_FILES            METADATA_LOAD_TIME  
------------ ------------------------ ---------------------------------------------------------------------------------------------------- -------------------- -------------------- --------------------
OTHER_SCHEMA SMALLFILE_EXAMPLE        hdfs://eikon1.fyre.ibm.com:8020/warehouse/tablespace/external/hive/other_schema.db/smallfile_example                33966                   72                  124

  1 record(s) selected.