CREATE TABLE AS
Use the CREATE TABLE AS command to create a table based on the results of a SELECT statement. The name of this command is sometimes abbreviated to CTAS.
- Common tasks
- Use the CREATE TABLE AS command to create a table and fill it
with data from a select command:
- Table columns have the names and data types that are associated with the output columns of the SELECT command, unless you override the column names by giving an explicit list of new column names.
- The CREATE TABLE AS command creates a table and evaluates the query once to fill the new table initially. The new table does not track subsequent changes to the source tables of the query. (In contrast, whenever you query a view, it re-evaluates the underlying SELECT commands.)
- Suppress auto-statistics on small CTAS
- During CTAS operations, Netezza Performance Server
typically runs GENERATE STATISTICS following the CTAS operation to collect statistics on the created
table. However, for shorter table queries, the GENERATE STATISTICS process can sometimes take more
time to run than the CTAS operation itself.
When a CTAS operation is submitted, table creation and insert operations take place. During the insert operation, Netezza Performance Server computes the Min values, Max values, and zone maps for all of the columns. If the insert operation yields a number of rows that is less than the configured threshold (ctas_auto_stats_min_rows), Netezza Performance Server skips the generate statistics operation.
There are two postgresql.conf file settings that control this feature:- enable_small_ctas_autostats enables or disables the feature to suppress auto-statistics on small tables. The setting is enabled by default.
- ctas_autostats_min_rows specifies the threshold number for a small table. Netezza Performance Server does not calculate statistics for any tables that are under this threshold. The default value is 10000.
- Handle distribution keys
- If you do not define explicit distribution keys, a CTAS table
inherits its distribution from the parent table. In general, the distribution
of the target table is defined by the final node in the plan. If the
final node has a valid distribution, the system assigns that distribution
to the CTAS target. Only if the final plan node has no distribution
(like a node at the host) does the system default to the first column
of the table
The default distribution key is the first column (hash distribution) where there is no discernible distribution key or the source stream into the CTAS table has round-robin distribution.
Table t_one inherits its distribution keys from f_one. (It does not default to first column.)CREATE TABLE t_one AS SELECT … FROM tbl …;
Table t_two inherits its distribution keys from the join table of (tbl_one+tbl_two), which would be their join keys.CREATE TABLE t_two AS SELECT … FROM tbl_one,tbl_two … WHERE tbl_one.b1 = tbl_two.b2 …
Table t_three inherits its distribution keys from the grouping node, which would be (b1,b2,b3).CREATE TABLE t_three AS SELECT … FROM tbl_one, tbl_two, tbl_three… WHERE … GROUP BY b1,b2,b3;
- System default for table distributions
- The postgresql.conf setting enable_random_table_distribute
controls the default distribution behavior when tables are created.
The default value 0 (disabled) specifies the following behavior, which
is the default behavior for tables that are created in previous releases:
- For a CREATE TABLE operation:
- If DISTRIBUTE ON is specified, use the specified distribution mechanism.
- If DISTRIBUTE ON is not specified, use the first column as the default distribution key.
- For a CREATE TABLE AS (CTAS) operation:
- If DISTRIBUTE ON is specified, use the specified distribution mechanism.
- If DISTRIBUTE ON is not specified, inherit the distribution keys from the plan. If the planner cannot determine a distribution from the plan, use the first column as the default key.
If you specify enable_random_table_distribute=1 (enabled), the system behavior changes to the following:- For a CREATE TABLE operation:
- If DISTRIBUTE ON is specified, use the specified distribution mechanism.
- If DISTRIBUTE ON is not specified, use random as the distribution method.
- For a CREATE TABLE AS (CTAS) operation:
- If DISTRIBUTE ON is specified, use the specified distribution mechanism.
- If DISTRIBUTE ON is not specified, inherit the distribution keys from the plan. If the planner cannot determine a distribution from the plan, use random as the distribution method.
In some cases, the planner might not be able to determine a distribution from the plan. For example:- If a final join happens on the host, then the distribution of the result of that join is non-deterministic.
- If the distribution column is missing from the <select-list>, then distribution is non-deterministic.
- If the distribution of the final result node is random, then distribution is considered non-deterministic.
- If the final join is a full-outer-join, then also distribution is non-deterministic.
This change obsoletes the NZ_DISABLE_SKEW_DEFENSE environment variable, which controlled this behavior in earlier releases. The upgrade to release 4.6 checks for the presence of the variable, and if it is set, uses its value to set enable_random_table_distribute to its corresponding value.
To change the postgresql.conf variable, do the following:- Use a standard editor to open the configuration file, nz/data/postgresql.conf.
- Find the line that contains
the enable_random_table_distribute = 0
. - Change the variable from 0 to 1 and save the change.
- Restart the Netezza Performance Server system for the changes to take effect.
- For a CREATE TABLE operation:
Syntax
CREATE [ TEMPORARY | TEMP ] TABLE <table> [ (<col>[,<col>…] ) ]
AS <select_clause> [ DISTRIBUTE ON ( <dist_col>[,<dist_col>…] ) ]
Inputs
Input | Description |
---|---|
TEMPORARY or TEMP | Create a temporary table. |
<table> | The name of the table to be created. A temporary table can have the same name as a permanent table that already exists, but other table names must be unique within the scope of a database. |
<col> | The name of a column in the new table. If you do not provide a column name, a name is taken from the output column name of the select clause. |
<select_clause> | A SELECT command as described in SELECT (to retrieve rows). |
<dist_col> | The name of a distribution column. For more information about distribution columns, see "Handle distribution keys" in Privileges. |
Outputs
See the CREATE TABLE and SELECT commands for a listing of possible output messages.
Privileges
You must be the admin user, the owner of the database or schema, or you must have the Create Table privilege. You must also have Select privilege for thetable that you are using in the SELECT statement.
Usage
The following provides sample usage.
- Consider the following table, named
cows
:cnumber | cname | cbreed | ckind ---------+--------+-------------+------- 3 | Cindy | Ayrshire | milk 8 | Muffin | Guernsey | milk 2 | Martha | Brown Swiss | milk 7 | Joe | Angus | beef 5 | Gretel | Highland | beef 1 | Betsy | Holstein | milk 6 | Bob | Angus | beef 4 | Mindy | Hereford | beef 9 | Milda | Jersey | milk
Create a table from two columns of the tablecows
:MYDB.SCH1(USER)=> CREATE TABLE cows2 AS SELECT cname, cbreed FROM cows;
The result is the new table, namedcows2
:MYDB.SCH1(USER)=> SELECT * FROM cows2; cname | cbreed --------+------------- Cindy | Ayrshire Muffin | Guernsey Mindy | Hereford Milda | Jersey Betsy | Holstein Gretel | Highland Martha | Brown Swiss Joe | Angus Bob | Angus