Use the CREATE TABLE AS command to create a table based
on the results of a SELECT statement. The name of this command is
sometimes abbreviated to CTAS.
The CREATE TABLE AS command has the following characteristics:
- Common tasks
- Use the CREATE TABLE AS command to create a table and fill it
with data from a select command:
- Table columns have the names and data types that are associated
with the output columns of the SELECT command, unless you override
the column names by giving an explicit list of new column names.
- The CREATE TABLE AS command creates a table and evaluates the
query once to fill the new table initially. The new table does not
track subsequent changes to the source tables of the query. (In contrast,
whenever you query a view, it re-evaluates the underlying SELECT commands.)
- Suppress auto-statistics on small CTAS
- During CTAS operations, IBM® Netezza® typically
runs GENERATE STATISTICS following the CTAS operation to collect statistics
on the created table. However, for shorter table queries, the GENERATE
STATISTICS process can sometimes take more time to run than the CTAS
operation itself.
When a CTAS operation is submitted, table creation
and insert operations take place. During the insert operation, Netezza computes
the Min values, Max values, and zone maps for all of the columns.
If the insert operation yields a number of rows that is less than
the configured threshold (ctas_auto_stats_min_rows), Netezza skips
the generate statistics operation.
There are two
postgresql.conf file
settings that control this feature:
- enable_small_ctas_autostats enables or disables the feature to
suppress auto-statistics on small tables. The setting is enabled by
default.
- ctas_autostats_min_rows specifies the threshold number for a small
table. Netezza does
not calculate statistics for any tables that are under this threshold.
The default value is 10000.
- Handle distribution keys
- If you do not define explicit distribution keys, a CTAS table
inherits its distribution from the parent table. In general, the distribution
of the target table is defined by the final node in the plan. If the
final node has a valid distribution, the system assigns that distribution
to the CTAS target. Only if the final plan node has no distribution
(like a node at the host) does the system default to the first column
of the table
The default distribution key is the first column (hash
distribution) where there is no discernible distribution key or the
source stream into the CTAS table has round-robin distribution.
Table
t_one inherits its distribution keys from f_one. (It does not default
to first column.)
CREATE TABLE t_one AS SELECT … FROM tbl …;
Table
t_two inherits its distribution keys from the join table of (tbl_one+tbl_two),
which would be their join keys.
CREATE TABLE t_two AS SELECT … FROM tbl_one,tbl_two … WHERE
tbl_one.b1 = tbl_two.b2 …
Table t_three inherits
its distribution keys from the grouping node, which would be (b1,b2,b3).
CREATE TABLE t_three AS SELECT … FROM tbl_one, tbl_two, tbl_three…
WHERE … GROUP BY b1,b2,b3;
- System default for table distributions
- The postgresql.conf setting enable_random_table_distribute
controls the default distribution behavior when tables are created.
The default value 0 (disabled) specifies the following behavior, which
is the default behavior for tables that are created in previous releases:
- For a CREATE TABLE operation:
- If DISTRIBUTE ON is specified, use the specified distribution
mechanism.
- If DISTRIBUTE ON is not specified, use the first column as the
default distribution key.
- For a CREATE TABLE AS (CTAS) operation:
- If DISTRIBUTE ON is specified, use the specified distribution
mechanism.
- If DISTRIBUTE ON is not specified, inherit the distribution keys
from the plan. If the planner cannot determine a distribution from
the plan, use the first column as the default key.
If you specify enable_random_table_distribute=1 (enabled),
the system behavior changes to the following:
- For a CREATE TABLE operation:
- If DISTRIBUTE ON is specified, use the specified distribution
mechanism.
- If DISTRIBUTE ON is not specified, use random as the distribution
method.
- For a CREATE TABLE AS (CTAS) operation:
- If DISTRIBUTE ON is specified, use the specified distribution
mechanism.
- If DISTRIBUTE ON is not specified, inherit the distribution keys
from the plan. If the planner cannot determine a distribution from
the plan, use random as the distribution method.
In some cases, the planner might not be able to determine
a distribution from the plan. For example:
- If a final join happens on the host, then the distribution of
the result of that join is non-deterministic.
- If the distribution column is missing from the <select-list>,
then distribution is non-deterministic.
- If the distribution of the final result node is random, then distribution
is considered non-deterministic.
- If the final join is a full-outer-join, then also distribution
is non-deterministic.
This change obsoletes the NZ_DISABLE_SKEW_DEFENSE environment
variable, which controlled this behavior in earlier releases. The
upgrade to release 4.6 checks for the presence of the variable, and
if it is set, uses its value to set enable_random_table_distribute
to its corresponding value.
To change the
postgresql.conf variable,
do the following:
- Use a standard editor to open the configuration file, nz/data/postgresql.conf.
- Find the line that contains the enable_random_table_distribute
= 0.
- Change the variable from 0 to 1 and save the change.
- Restart the Netezza system
for the changes to take effect.
Syntax
Syntax for using the CREATE TABLE
AS command:
CREATE [ TEMPORARY | TEMP ] TABLE <table> [ (<col>[,<col>…] ) ]
AS <select_clause> [ DISTRIBUTE ON ( <dist_col>[,<dist_col>…] ) ]
Inputs
The CREATE TABLE AS
command takes the following inputs:
Table 1. CREATE
TABLE AS inputs| Input |
Description |
| TEMPORARY or TEMP |
Create a temporary table. |
| <table> |
The name of the table to be created. A temporary
table can have the same name as a permanent table that already exists,
but other table names must be unique within the scope of a database. |
| <col> |
The name of a column in the new table. If you do not provide
a column name, a name is taken from the output column name of the
select clause. |
| <select_clause> |
A SELECT command as described in SELECT (to retrieve rows). |
| <dist_col> |
The name of a distribution column. For more
information about distribution columns, see "Handle distribution keys"
in Privileges. |
Outputs
See the CREATE TABLE
and SELECT commands for a listing of possible output messages.
Privileges
You must be the
admin user, the owner of the database or schema, or you must have
the Create Table privilege. You must also have Select privilege for
thetable that you are using in the SELECT statement.
Usage
The following provides
sample usage.