Database statistics

For the system to create the best execution plan for a query, it evaluates what it knows about the database tables that it accesses. Without up-to-date statistics, the system uses internal, default values that are independent of the actual table and which result in suboptimal queries with long run times.

The following table describes the statistics information.
Table 1. Database information
Element Description
Per table Number of records
Per column
  • Minimum value
  • Maximum value
  • Null counts (per column)
  • Number of distinct values (also called dispersion)
The system uses statistics for many purposes:
  • Based on the number of records, the number of distinct values for a column, and assuming uniform distribution between the min and max values, the optimizer estimates the number of relevant rows and determines which is the smaller of two join tables.
  • Based on the min and max values, the optimizer determines what type of math needs to be done (for example, 64-bit or 128-bit).
  • If there are nulls in the database tables, then during code generation the system must generate more code to test and check fields for null values. Extra code means extra CPU cycles during execution time.

The GENERATE STATISTICS command collects this information. If you have the GenStats privilege, you can run this command on a database, table, or individual columns. By default, the admin user can run the command on any database (to process all the tables in the database) or any individual table.

The admin user can assign other users this privilege. For example, to give user1 privilege to run GENERATE STATISTICS on one or all tables in the DEV database, the admin user must grant user1 LIST privilege on tables in the system database, and GENSTATS from the dev database, as in these sample SQL commands:

For more information about the GenStats privilege, see Table 1.

The following table describes the nzsql command syntax for these cases.

Table 2. Generate statistics syntax
Description Syntax
A database (all tables) GENERATE STATISTICS;
A specific table (all columns) GENERATE STATISTICS ON table_name;
Individual columns in a table GENERATE STATISTICS ON my_table(name, address, zip);

The GENERATE STATISTICS command reads every row in every table to determine dispersion values (no sampling). It provides the most accurate and best quality statistics.