IBM PureData System for Analytics, Version 7.1

CREATE TABLE AS

Use the CREATE TABLE AS command to create a table based on the results of a SELECT statement. The name of this command is sometimes abbreviated to CTAS.

The CREATE TABLE AS command has the following characteristics:

Common tasks

Use the CREATE TABLE AS command to create a table and fill it with data from a select command:

Table columns have the names and data types that are associated with the output columns of the SELECT command, unless you override the column names by giving an explicit list of new column names.
The CREATE TABLE AS command creates a table and evaluates the query once to fill the new table initially. The new table does not track subsequent changes to the source tables of the query. (In contrast, whenever you query a view, it re-evaluates the underlying SELECT commands.)

Suppress auto-statistics on small CTAS

During CTAS operations, IBM® Netezza® typically runs GENERATE STATISTICS following the CTAS operation to collect statistics on the created table. However, for shorter table queries, the GENERATE STATISTICS process can sometimes take more time to run than the CTAS operation itself.

When a CTAS operation is submitted, table creation and insert operations take place. During the insert operation, Netezza computes the Min values, Max values, and zone maps for all of the columns. If the insert operation yields a number of rows that is less than the configured threshold (ctas_auto_stats_min_rows), Netezza skips the generate statistics operation.

There are two postgresql.conf file settings that control this feature:

enable_small_ctas_autostats enables or disables the feature to suppress auto-statistics on small tables. The setting is enabled by default.
ctas_autostats_min_rows specifies the threshold number for a small table. Netezza does not calculate statistics for any tables that are under this threshold. The default value is 10000.

Handle distribution keys

If you do not define explicit distribution keys, a CTAS table inherits its distribution from the parent table. In general, the distribution of the target table is defined by the final node in the plan. If the final node has a valid distribution, the system assigns that distribution to the CTAS target. Only if the final plan node has no distribution (like a node at the host) does the system default to the first column of the table

The default distribution key is the first column (hash distribution) where there is no discernible distribution key or the source stream into the CTAS table has round-robin distribution.

Table t_one inherits its distribution keys from f_one. (It does not default to first column.)

CREATE TABLE t_one AS SELECT … FROM tbl …;

Table t_two inherits its distribution keys from the join table of (tbl_one+tbl_two), which would be their join keys.

CREATE TABLE t_two AS SELECT … FROM tbl_one,tbl_two … WHERE
tbl_one.b1 = tbl_two.b2 …

Table t_three inherits its distribution keys from the grouping node, which would be (b1,b2,b3).

CREATE TABLE t_three AS SELECT … FROM tbl_one, tbl_two, tbl_three…
WHERE … GROUP BY b1,b2,b3;

System default for table distributions

The postgresql.conf setting enable_random_table_distribute controls the default distribution behavior when tables are created. The default value 0 (disabled) specifies the following behavior, which is the default behavior for tables that are created in previous releases:

For a CREATE TABLE operation:
- If DISTRIBUTE ON is specified, use the specified distribution mechanism.
- If DISTRIBUTE ON is not specified, use the first column as the default distribution key.
For a CREATE TABLE AS (CTAS) operation:
- If DISTRIBUTE ON is specified, use the specified distribution mechanism.
- If DISTRIBUTE ON is not specified, inherit the distribution keys from the plan. If the planner cannot determine a distribution from the plan, use the first column as the default key.

If you specify enable_random_table_distribute=1 (enabled), the system behavior changes to the following:

For a CREATE TABLE operation:
- If DISTRIBUTE ON is specified, use the specified distribution mechanism.
- If DISTRIBUTE ON is not specified, use random as the distribution method.
For a CREATE TABLE AS (CTAS) operation:
- If DISTRIBUTE ON is specified, use the specified distribution mechanism.
- If DISTRIBUTE ON is not specified, inherit the distribution keys from the plan. If the planner cannot determine a distribution from the plan, use random as the distribution method.

In some cases, the planner might not be able to determine a distribution from the plan. For example:

If a final join happens on the host, then the distribution of the result of that join is non-deterministic.
If the distribution column is missing from the <select-list>, then distribution is non-deterministic.
If the distribution of the final result node is random, then distribution is considered non-deterministic.
If the final join is a full-outer-join, then also distribution is non-deterministic.

This change obsoletes the NZ_DISABLE_SKEW_DEFENSE environment variable, which controlled this behavior in earlier releases. The upgrade to release 4.6 checks for the presence of the variable, and if it is set, uses its value to set enable_random_table_distribute to its corresponding value.

To change the postgresql.conf variable, do the following:

Use a standard editor to open the configuration file, nz/data/postgresql.conf.
Find the line that contains the enable_random_table_distribute = 0.
Change the variable from 0 to 1 and save the change.
Restart the Netezza system for the changes to take effect.

Syntax

Syntax for using the CREATE TABLE AS command:

CREATE [ TEMPORARY | TEMP ] TABLE <table> [ (<col>[,<col>…] ) ]
AS <select_clause> [ DISTRIBUTE ON ( <dist_col>[,<dist_col>…] ) ]

Inputs

The CREATE TABLE AS command takes the following inputs:

Table 1. CREATE TABLE AS inputs
Input	Description
TEMPORARY or TEMP	Create a temporary table.
<table>	The name of the table to be created. A temporary table can have the same name as a permanent table that already exists, but other table names must be unique within the scope of a database.
<col>	The name of a column in the new table. If you do not provide a column name, a name is taken from the output column name of the select clause.
<select_clause>	A SELECT command as described in SELECT (to retrieve rows).
<dist_col>	The name of a distribution column. For more information about distribution columns, see "Handle distribution keys" in Privileges.

Outputs

See the CREATE TABLE and SELECT commands for a listing of possible output messages.

Privileges

You must be the admin user, the owner of the database or schema, or you must have the Create Table privilege. You must also have Select privilege for thetable that you are using in the SELECT statement.

Usage

The following provides sample usage.

Consider the following table, named cows:

    cnumber | cname  |   cbreed    | ckind
   ---------+--------+-------------+-------
          3 | Cindy  | Ayrshire    | milk
          8 | Muffin | Guernsey    | milk
          2 | Martha | Brown Swiss | milk
          7 | Joe    | Angus       | beef
          5 | Gretel | Highland    | beef
          1 | Betsy  | Holstein    | milk
          6 | Bob    | Angus       | beef
          4 | Mindy  | Hereford    | beef
          9 | Milda  | Jersey      | milk

Create a table from two columns of the table cows:

MYDB.SCH1(USER)=> CREATE TABLE cows2 AS SELECT cname, cbreed FROM cows;

The result is the new table, named cows2:

MYDB.SCH1(USER)=> SELECT * FROM cows2;

    cname  |   cbreed
   --------+-------------
    Cindy  | Ayrshire
    Muffin | Guernsey
    Mindy  | Hereford
    Milda  | Jersey
    Betsy  | Holstein
    Gretel | Highland
    Martha | Brown Swiss
    Joe    | Angus
    Bob    | Angus