CREATE EXTERNAL TABLE statement
While tables typically reside in a database, an external table resides in a text-based, delimited file, or in a fixed-length-format file outside of a database.
- Store data outside the database while retaining the ability to query that data. To unload data
from the database into an external file, specify the external table as the target table in one of
the following SQL statements:
- INSERT SQL
- SELECT INTO SQL
- CREATE EXTERNAL TABLE AS SELECT SQL
- Load data from an external file into a table in the database. You can perform operations such as casts, joins, and dropping columns to manipulate data during loading. To load data into the database from an external table, use a FROM clause in a SELECT SQL statement as you would for any other table.
- Transfer data to another application.
An advantage of using external tables for Extraction-Transformation-Loading (ETL) processes is that they can be carried out using plain SQL. Because an SQL-based ETL process can be initiated from any SQL client, it eliminates the need for special ETL tools.
- Named
- The external table has a name and catalog entry similar to a normal table.
- Transient
- The external table has a system-generated name of the form
SYSTET<number>
and does not have a catalog entry. For example, the system might create a transient external table to hold the result of a query. The lifetime of such a table is the duration of the query.
Invocation
This statement can be embedded in an application program or issued using dynamic SQL statements. It is an executable statement that can be dynamically prepared only if DYNAMICRULES run behavior is in effect for the package (SQLSTATE 42509).
Authorization
- CREATETAB authority
- One of the following privileges or authorities:
- USE privilege on the table space
- SYSADM authority
- SYSCTRL authority
- One of the following privileges or authorities:
- IMPLICIT_SCHEMA authority on the database (if the implicit or explicit schema name of the table does not exist)
- CREATEIN privilege on the schema (if the schema name of the table refers to an existing schema)
- SCHEMAADM authority on the schema (if the schema name of the table refers to an existing schema)
- The authorization ID must be the same as the owner of the root table of the table hierarchy.
- The privileges that are held by the authorization ID must include SCHEMAADM authority on the schema that contains the root table of the table hierarchy.
- The privileges that are held by the authorization ID must include DBADM authority.
Syntax
The syntax of this statement depends on the nature of the external table that is to be created:- Use the following syntax to create, in the catalog, a table definition for a new external table. Specifying a table name is mandatory, so the resulting external table is a named table. A DATAOBJECT or FILE_NAME option must be specified to identify the target file.
- Use the following syntax to use an existing table as a template for a new external table and to
populate it with the contents of the source table. If this statement specifies a table name
explicitly, the resulting external table is a named table; otherwise, the resulting external table
is a transient table. The file name must be specified by either the
file-name
parameter or a DATAOBJECT or FILE_NAME option.
Description
- table-name
- The names of the external table. The name, including the implicit or explicit qualifier, must not identify a table, view, nickname, or alias that is already described in the catalog. The schema name cannot be SYSIBM, SYSCAT, SYSFUN, or SYSSTAT (SQLSTATE 42939).
- file-name
- The fully-qualified name of the file (or any medium that can be treated as a file) that is to contain the external table to be created. If this parameter is specified, the DATAOBJECT or FILE_NAME option cannot be specified.
- column-definition
- Defines the attributes of a column.
- column-name
- Names a column of the table. The name cannot be qualified, and the same name cannot be used for more than one column of the table (SQLSTATE 42711).
- built-in-type
- One of the following built-in data types:
- SMALLINT
- A small integer.
- [INTEGER | INT]
- A large integer.
- BIGINT
- A big integer.
- [DECIMAL | DEC | NUMERIC | NUM](precision-integer, scale-integer)
- A decimal number.
- The precision integer specifies the total number of digits. It must be in the range 1 ‑ 31. The default is 5.
- The scale integer specifies the number of digits to the right of the decimal point. It cannot be negative and cannot exceed the precision. The default is 0.
- FLOAT(integer)
- A single or double-precision floating-point number. If the specified length is in the range:
- 1 - 24, the number uses single precision
- 25 - 53, the number uses double-precision
Instead of FLOAT, you can specify:- REAL
- For single precision floating-point.
- DOUBLE
- For double-precision floating-point.
- DOUBLE PRECISION
- For double-precision floating-point.
- FLOAT
- For double-precision floating-point.
- DECFLOAT(precision-integer)
- A decimal floating-point number. The precision integer specifies the total number of digits, which can be either 16 or 34. The default is 34.
- [CHARACTER | CHAR](integer [OCTETS | CODEUNITS32])
- A fixed-length character string of the specified number of code units. This number can range from 1 ‑ 255 OCTETS or from 1 - 63 CODEUNITS32. The default is 1.
- [VARCHAR | CHARACTER VARYING | CHAR VARYING](integer [OCTETS | CODEUNITS32])
- A varying-length character string with a maximum length of the specified number of code units. This number can range from 1 ‑ 32672 OCTETS or from 1 - 8168 CODEUNITS32.
- FOR BIT DATA
- Specifies that the contents of the column are to be treated as bit (binary) data. During data exchange with other systems, code page conversions are not performed. Comparisons are done in binary, irrespective of the database collating sequence.
- [CLOB | CHARACTER LARGE OBJECT | CHAR LARGE OBJECT](integer [K] [OCTETS | CODEUNITS32])
- A character large object string with a maximum length of the specified number of code units. The
default maximum length is 65,535
bytes.If you want to multiply the length integer by 1024, specify a K (kilo) multiplier.
- Regardless of whether you use a K multiplier, the resulting length is limited by the maximum length of a CLOB column in an external table, which is 65,535 OCTETS, 32,767 CODEUNITS16, or 16,383 CODEUNITS32. Note that 64K OCTETS and 16K CODEUNITS32 each exceed the maximum length by one, and so are not allowed.
- Any number of spaces (including zero spaces) are allowed between the data
type and the length specification or between the length integer and the
K multiplier. For
example, the following specifications are all equivalent and
valid:
CLOB(50K) CLOB(50 K) CLOB (50 K)
- The K multiplier can be specified in either uppercase or lowercase.
The default string units are OCTETS.
- OCTETS
- Specifies that the units of the length attribute are bytes.
- CODEUNITS32
- Specifies that the units of the length attribute are Unicode UTF-32 code units, which approximates counting in characters. This does not affect the underlying code page of the data type. The actual length of a data value is determined by counting the UTF-32 code units as if the data were converted to UTF-32.
- GRAPHIC(integer [CODEUNITS16 | CODEUNITS32])
- A fixed-length graphic string of the specified length, which can range from 1 ‑ 127 double bytes, 1 ‑ 127 CODEUNITS16, or 1 ‑ 63 CODEUNITS32. The default length is 1.
- VARGRAPHIC(integer [CODEUNITS16 | CODEUNITS32])
- A varying-length graphic string of the specified maximum length, which can range from 1 ‑ 16336 double bytes, 1 ‑ 16336 CODEUNITS16, or 1 ‑ 8168 CODEUNITS32.
- DBCLOB(integer [K] [CODEUNITS16 | CODEUNITS32])
- A character large object string of the specified maximum length in double bytes, Unicode UTF-16
code units, or Unicode UTF-32 code units. The default maximum length is 32,767 double bytes.If you want to multiply the length integer by 1024, specify a K (kilo) multiplier.
- Regardless of whether you use a K multiplier, the resulting length is limited by the maximum length of a DBCLOB column in an external table, which is 32,767 CODEUNITS16 or 16,383 CODEUNITS32. Note that 32K CODEUNITS16 and 16K CODEUNITS32 each exceed the maximum length by one, and so are not allowed.
- Any number of spaces (including zero spaces) are allowed between the data
type and the length specification or between the length integer and the
K multiplier. For
example, the following specifications are all equivalent and
valid:
DBCLOB(50K) DBCLOB(50 K) DBCLOB (50 K)
- The K multiplier can be specified in either uppercase or lowercase.
The default string units are CODEUNITS16.
- CODEUNITS16
- Specifies that the units of the length attribute are Unicode UTF-16 code units, which is the same as counting in double bytes.
- CODEUNITS32
- Specifies that the units of the length attribute are Unicode UTF-32 code units. This does not affect the underlying code page of the data type. The actual length of a data value is determined by counting the UTF-32 code units as if the data were converted to UTF-32.
- [NATIONAL CHARACTER | NATIONAL CHAR | NCHAR](integer)
- A fixed-length string of the specified length. The default length is
1.
The NATIONAL CHARACTER type maps to a fixed-length character string with string units CODEUNITS32.
- [NATIONAL CHARACTER VARYING | NATIONAL CHAR VARYING | NCHAR VARYING | NVARCHAR](integer)
- A varying-length string of the specified maximum
length.
The NATIONAL CHARACTER VARYING type maps to a varying-length character string with string units CODEUNITS32.
- [NATIONAL CHARACTER LARGE OBJECT | NCHAR LARGE OBJECT | NCLOB](integer [K])
- A large object string of the specified maximum length. The default maximum length is 16,383
double
bytes.
This data type maps to a character large object (CLOB) with string units CODEUNITS32. See the description of the CLOB parameter for information about possible values for the length integer and how to use a K (kilo) multiplier.
- BINARY(integer)
- A fixed-length binary string of the specified length, which must be in the range 1 ‑ 255 bytes. The default length is 1.
- [VARBINARY | BINARY VARYING](integer)
- A varying-length binary string of the specified maximum length, which must be in the range 1 ‑ 32672 bytes.
- [BLOB | BINARY LARGE OBJECT](integer [K])
- A binary large object string with a maximum length of the specified number of code units. The
default maximum length is 65,535 bytes.If you want to multiply the length integer by 1024, specify a K (kilo) multiplier.
- Regardless of whether you use a K multiplier, the resulting length is limited by the maximum length of a BLOB column in an external table, which is 65,535 bytes. Note that 64K exceeds the maximum length by one, and so is not allowed.
- Any number of spaces (including zero spaces) are allowed between the data
type and the length specification or between the length integer and the
K multiplier. For
example, the following specifications are all equivalent and
valid:
BLOB(50K) BLOB(50 K) BLOB (50 K)
- The K multiplier can be specified in either uppercase or lowercase.
- DATE
- A date.
- TIME
- A time.
- TIMESTAMP(integer) or TIMESTAMP
- A timestamp. The integer specifies the number of decimal places for fractions of seconds, from 0 (seconds) to 12 (picoseconds). The default is 6 (microseconds).
- BOOLEAN
- A Boolean value.
- LIKE table-name1 or view-name or nickname
- Specifies that the columns of the table have the same name and description as the columns of the
specified table (table-name1), view (view-name),
or nickname (nickname). The specified table, view, or nickname must either
exist in the catalog or must be a declared temporary table. A typed table or typed view cannot be
specified (SQLSTATE 428EC). The use of LIKE is an implicit definition of n columns, where n is the number of columns in the identified table (including implicitly hidden columns), view, or nickname. A column of the new table that corresponds to an implicitly hidden column in the existing table will also be defined as implicitly hidden. The implicit definition depends on what is specified after LIKE:
- If a table is specified, then the implicit definition includes the column name, data type, hidden attribute, and nullability characteristic of each of the columns of that table. If EXCLUDING COLUMN DEFAULTS is not specified, then the column default is also included.
- If a view is specified, then the implicit definition includes the column name, data type, and nullability characteristic of each of the result columns of the fullselect defined in that view. The data types of the view columns must be data types that are valid for columns of a table.
- If a nickname is specified, then the implicit definition includes the column name, data type, and nullability characteristic of each column of that nickname.
- If a protected table is specified, the new table inherits the same security policy and protected columns as the identified table.
- If a table is specified and if that table contains a row-begin column, row-end column, or transaction-start-ID column, the corresponding column of the new table inherits only the data type of the source column. The new column is not considered a generated column.
- If a table that includes a period is specified, the new table does not inherit the period definition.
- If a system-period temporal table is specified, the new table is not a system-period temporal table.
- If a random distribution table that uses the random by generation method is specified, and if the new table that is being created does not share the same table distribution, the RANDOM_DISTRIBUTION_KEY column that is used to generate the random distribution values is not included.
Column default attributes can be included or excluded, based on the copy-attributes clauses. The implicit definition does not include any other attributes of the specified table, view, or nickname. Consequently, the new table does not have any primary key, unique constraints, foreign key constraints, referential integrity constraints, triggers, indexes, ORGANIZE BY specification, or PARTITIONING KEY specification.
When a table is identified in the LIKE clause and that table contains a ROW CHANGE TIMESTAMP column, the corresponding column of the new table inherits only the data type of the ROW CHANGE TIMESTAMP column. The new column is not considered to be a generated column.
If a table is specified, and if row or column level access control is activated for that table, it is not inherited by the new table.
- option
- The following options control the loading of data to or retrieval of data from an external-table
file. The value of each option is a text string and is not case-sensitive.
- BOOLSTYLE or BOOLEAN_STYLE
- During a load operation, all Boolean values must use the same style. This option specifies the
Boolean style that is to be used:
- 1_0 (this is the default)
- T_F
- Y_N
- YES_NO
- TRUE_FALSE
- CARDINALITY
- Non-zero positive integer value to override the estimation of the expected number of returned rows.
- CCSID
- The coded character set identifier (CCSID) of the input data file. The value can be any valid
integer value from the CCSID specification. There is no default value. The CCSID and ENCODING options are mutually exclusive when
the value of the ENCODING option is UTF8, LATIN9, or INTERNAL.Which styles are used for dates and times depends on whether a CCSID is specified:
- When a CCSID is specified, and when DATESTYLE, TIMESTYLE, DATEDELIM, or TIMEDELIM are not specified, the values or defaults for DATE_FORMAT, TIME_FORMAT, and TIMESTAMP_FORMAT are used.
- When a CCSID is not specified, and when TIMESTAMP_FORMAT, DATE_FORMAT or TIME_FORMAT are not specified, the values or defaults for DATESTYLE, TIMESTYLE, DATEDELIM, and TIMEDELIM are used.
- COMPRESS
- For a load operation or an unload operation, whether the data file data is compressed:
- GZIP
- The data file data is compressed by using the GZIP compression algorithm.
- NO
- The data file data is not compressed. This is the default.
- LZ4
- The data file data is compressed by using the LZ4 compression algorithm.
- CRINSTRING
- How to interpret an unescaped carriage-return (CR) or carriage-return line-feed (CRLF) character:
- TRUE or ON
- An unescaped CR character is interpreted as data, not as a record delimiter. An unescaped CRLF character is split into a CR character, which is interpreted as data, and an LF character, which is interpreted as a record delimiter. The value of the LFINSTRING option does not affect how this LF character is interpreted.
- FALSE or OFF
- An unescaped CR or CRLF character is interpreted as a record delimiter. This is the default.
- CTRLCHARS
- Whether to allow an ASCII value 1 - 31 in a CHAR or VARCHAR field. Any NULL, CR, or LF
characters must be escaped. Allowed values are:
- TRUE or ON
- An ASCII value 1 - 31 in a CHAR or VARCHAR field is allowed.
- FALSE or OFF
- An ASCII value 1 - 31 in a CHAR or VARCHAR field is not allowed. This is the default.
- DATAOBJECT or FILE_NAME
- The fully-qualified name of the file (or any medium that can be treated as a file) that is to
contain the external table to be created. This option is mandatory when the name of the file is not
specified immediately after the table name; otherwise, it is not
allowed.
- For a cloud-based managed service such as Db2 Warehouse on Cloud,
when the REMOTESOURCE option is set to LOCAL (this is its default value), the path to the external
table file is relative to your Db2 managed services home directory.
Otherwise, the path to the external table file is relative to the path
/home/
followed by the authorization ID of the table definer. For example, if the authorization ID of the table definer isuser1
, the path to the external table file is relative to/home/user1/
. - For an on-premise product such as Db2 Warehouse, when
both the REMOTESOURCE option is set to LOCAL (this is its default value) and the
extbl_strict_io configuration parameter is set to NO, the path to the external
table file is an absolute path and must be one of the paths specified by the
extbl_location configuration parameter. Otherwise, the path to the external
table file is relative to the path that is specified by the extbl_location
configuration parameter followed by the authorization ID of the table definer. For example, if
extbl_location is set to
/home/xyz
and the authorization ID of the table definer isuser1
, the path to the external table file is relative to/home/xyz/user1/
.
The file name must be a valid UTF-8 string.
For a load operation, the following conditions apply:- The file must already exist.
- Required permissions:
- If the external table is a named external table, the owner must have read permission for the file and write permission for the LOGDIR directory.
- If the external table is a transient external table, the authorization ID of the statement must have read permission for the file and write permission for the LOGDIR directory.
- If the file exists, it is overwritten.
- Required permissions:
- If the external table is a named external table, the owner must have read and write permission for the directory of this file.
- If the external table is transient, the authorization ID of the statement must have read and write permission for the directory of this file.
- For a cloud-based managed service such as Db2 Warehouse on Cloud,
when the REMOTESOURCE option is set to LOCAL (this is its default value), the path to the external
table file is relative to your Db2 managed services home directory.
Otherwise, the path to the external table file is relative to the path
- DATEDELIM
- The delimiter character that separates the components of a date, according to the format specified by the DATESTYLE option. If you specify an empty string, there is no delimiter between the date components, and days and months must be specified as two-digit numbers. When DATESTYLE is set to MONDY or MONDY2, the default DATEDELIM value is a space. The TIMESTAMP_FORMAT and DATEDELIM options are mutually exclusive.
- DATESTYLE
- How to interpret the date format. For days or months in the range 1 ‑ 9, use 1
digit, 2 digits, or a space followed by a single digit. When the DATEDELIM option is a space, you
can specify a comma after the day. An error occurs if you:
- Specify zero for a day, month, or year
- Specify a nonexistent date (for example, 32 August or 30 February)
Table 1. Possible values for the DateStyle option. The example shows how the date 21 March 2014 would be represented when DATEDELIM is set to '-'. Value Description Example YMD 4-digit year, 2-digit month, 2-digit day. This is the default. 2014-03-21 DMY 2-digit day, 2-digit month, 4-digit year. 21-03-2014 MDY 2-digit month, 2-digit day, 4-digit year. 03-21-2014 MONDY 3-character month, 2-digit day, 4-digit year. Mar 21 2014 DMONY 2-digit day, 3-character month, 4-digit year. 21-Mar-2014 Y2MD 2-digit year, 2-digit month, 2-digit day. Not supported for unloads. 14-03-21 DMY2 2-digit day, 2-digit month, 2-digit year. Not supported for unloads. 21-03-14 MDY2 2-digit month, 2-digit day, 2-digit year. Not supported for unloads. 03-21-14 MONDY2 3-character month, 2-digit day, 2-digit year. Not supported for unloads. Mar 21 14 DMONY2 2-digit day, 3-character month, 2-digit year. Not supported for unloads. 21-Mar-14 - DATETIMEDELIM
- A single-byte character that separates the date component and time component of the timestamp data type.
- DATE_FORMAT
- The format of the date field in the data file. The value can be any of the date format strings that are accepted by the TIMESTAMP_FORMAT scalar function. The default is YYYY-MM-DD. The DATE_FORMAT option and the DATEDELIM or DATESTYLE option are mutually exclusive.
- DECIMALDELIM or DECIMAL_CHARACTER
- The decimal delimiter for the data types FLOAT, DOUBLE, TIME, and TIMESTAMP. Allowed values are
','
and'.'
. - DECPLUSBLANK
- Specifies how the positive decimal value is represented during the unload operation.
- DELIMITER or COLUMN_DELIMITER
- The character that is used to delimit the fields of an input or output record. The default is a
vertical bar (
'|'
).You can specify a character in the 7-bit ASCII range (decimal 1 ‑ 127) in any of the following ways:- As a single character (for example
DELIMITER ';'
) - By specifying its corresponding ASCII decimal value (for example,
DELIMITER 59
orDELIMITER '59'
) - By specifying its corresponding ASCII
hex value (for example,
DELIMITER x'3B'
)
The decimal range 128 - 255 is supported only with the ISO character set input file by specifying its corresponding ASCII decimal value or hex value. If the input file is in the UTF8 character set, this delimiter value range is not supported.
- As a single character (for example
- ENCODING
- The type of data in the file:
- UTF8
- The file uses UTF8 encoding for all character data.
- LATIN9
- The file uses LATIN9 encoding for all character data.
- INTERNAL
- The file uses a mixture of both UTF8 and LATIN9 encoding, or you are unsure which type of encoding is used. The system checks the data and encodes the data as needed. Because this checking of the data reduces overall performance, use this value only when necessary. This is the default.
- INTERNAL
- This is the default option.
- DBCS_GRAPHIC
- This value is allowed only for a load operation, not an unload operation. If this value is
specified, the CCSID option must also be specified. During the load operation, fields of type
GRAPHIC or VARGRAPHIC are encoded using the double-byte character set of the specified CCSID; fields
of all other types are encoded using the mixed-byte character set of the specified
CCSID.Note: ENCODING cannot be set to DBCS_GRAPHIC for a DEL file that was created by the EXPORT utility, because such DEL files are encoded using a single character set.
- ESCAPECHAR or ESCAPE_CHARACTER
- Which character is to be regarded as an escape character. An escape character indicates that the character that follows it, which would otherwise be treated as a field-delimiter character or end-of-row sequence character, is instead treated as part of the value in the field. The escape character is ignored for graphic-string data. There is no default.
- FILLRECORD
- For a load operation, the field of a record are loaded into the columns of a target table from
left to right. This option specifies whether an input record can contain fewer fields than there are
columns defined for the target table:
- TRUE or ON
- An input line can contain fewer fields, provided that all columns for which a value is missing are nullable. Missing values are set to NULL. If one or more columns for which a value is missing is not nullable, the record is rejected.
- FALSE or OFF
- An input line that contains fewer columns is rejected. This is the default.
- FORMAT or FILE_FORMAT
- The data format of the source file:
- TEXT
- The data to be loaded or unloaded is in text-delimited format. This is the default.
- INTERNAL
- The data is in an internal format used by Netezza Platform Software (NPS). This value is valid
only when loading data from a file to the database, not when unloading data to a file. If this value
is specified for the FORMAT option, the following options, and only these options, must also be specified:
- DATAOBJECT or FILE_NAME.
- COMPRESS. This must be set to GZIP.
- BINARY
- The data is in an internal format that is used by Db2.
- FIXED
- The data is in fixed-length format.
- IGNOREZERO or TRIM_NULLS
- Specifies whether the binary value zero in CHAR fields and VARCHAR fields is to be discarded.
- TRUE or ON
- The byte value zero is ignored.
- FALSE or OFF
- The byte value zero is not ignored. This is the default.
- KEEP
- The binary value zero is accepted and allowed as part of the input field.
- INCLUDEHEADER or COLUMN_NAMES
- For an unload operation, whether the table column names are to be included as headers in the
external-table file:
- TRUE or ON
- The table column names are to be included as headers.
- FALSE or OFF
- The table column names are not to be included as headers. This is the default.
- INCLUDEZEROSECONDS
- For an unload operation, whether to specify 00 as the value for seconds when no value for
seconds is available:
- TRUE or ON
- Specify 00 as the value for seconds.
- FALSE or OFF
- Do not specify a value for seconds. This is the default.
- INCLUDEHIDDEN
- For a load operation, specify whether hidden column values are present in a data file.
- LFINSTRING
- Specifies how to interpret an unescaped line-feed (sometimes called an LF or newline) character
within string data:
- TRUE or ON
- An unescaped LF character is interpreted as a record delimiter only if it is in the last field of a record; otherwise, it is treated as data. To cause an LF character that is in the last field of a record to be treated as data, enclose the value of that field in single or double quotation marks.
- FALSE or OFF
- An unescaped LF character is interpreted as a record delimiter regardless of its position. This is the default.
This option is not supported for unload operations. And applies only to line-feed characters, not carriage-return line-feed (CRLF) characters.
Attention: This SQL compatibility enhancement is only available in Db2 Version 11.5 Mod Pack 2 and later versions. - LOGDIR or ERROR_LOG
- The directory to which the following files are written:
- <database>.<schema>.<external-table-name>.<file-name>.<application-handle>.<id>.bad
- A file containing rejected records (that is, records that could not be processed).
- <database>.<schema>.<external-table-name>.<file-name>.<application-handle>.<id>.log
- A log file.
<file-name>
) is truncated so that the maximum is not exceeded.If a .log or .bad file is generated while carrying out an operation on a partition, the name of the generated file is suffixed with a period followed by the 3-digit partition number.
- MAXERRORS or MAX_ERRORS
- For a load operation, the threshold for the number of rejected records at which the system stops processing and immediately rolls back the load. The default is 1 (that is, a single rejected record results in a rollback).
- MAXROWS or MAX_ROWS
- If set to a positive integer, this specifies the maximum number of records (rows) in the external table that are to be processed. If set to 0 (the default), there is no limit and all rows are processed. During a load operation, if MAXROWS is set to a positive value, after that number of rows are processed, regardless of whether some of the rows were rejected or skipped, the system ends the load operation and commits all inserted records.
- MERIDIANDELIM
- A single-byte character that separates the seconds component from the AM token or PM token in the 12-hour delimited and undelimited formats of a time value.
- NOLOG
- Specifies whether the .log file for the external table is created.
- NULLVALUE or NULL_VALUE
- The UTF-8 string of at most 4 bytes that is to be used to indicate a null value. The default is
'NULL'
. - PARTITION
- If you have an MPP
deployment, an external table can be partitioned into several files. The name of each of the
data files that comprise an external table are suffixed with a period followed by a 3-digit number
from 000 to 999 that indicates the number of the partition. For example, if an external table with
the name dataFile.txt is divided into three partitions, the files that comprise it have the
names dataFile.txt.000, dataFile.txt.001, and dataFile.txt.002. These files
must be accessible from all members.For a partitioned external table, the PARTITION option specifies to which partition or partitions the statement applies:
- PARTITION ALL
- The statement applies to all of the partitions that comprise the external table. For an unload operation, this is the only value that is allowed.
- PARTITION (n TO n)
- The statement applies to all of the partitions in the specified range, for example, PARTITION (54 TO 62).
- PARTITION (n,n,…)
- The statement applies only to the specified partition or partitions, for example, PARTITION (53) or PARTITION (51,57,58). If more than one partition number is specified, they must be in ascending order (sqlcode SQL0263N with SQLSTATE=42615) and there can be no duplicates (sqlcode SQL0265N with SQLSTATE=42615).
If a .log or .bad file is generated while carrying out an operation on a partitioned external table, the name of the generated file is suffixed with a period followed by the 3-digit partition number.
If you have an MPP deployment and the PARTITION option is not specified, the external table is treated as single-partitioned table on the coordinator member. The names of the external table file and the .log and .bad files are not suffixed with a partition number.
If you do not have an MPP deployment, the PARTITION option can be specified, but only with the value ALL, (0 to 0), or (0) (sqlcode SQL0644N with SQLSTATE=42615). It will have no effect.
The REMOTESOURCE and PARTITION options are mutually exclusive.
- QUOTEDNULL
- For a load operation, how to interpret a value that is enclosed in single or double quotation
marks and that matches the null value specified by the NULLVALUE or NULL_VALUE option (for example,
"NULL" or 'NULL'):
- TRUE or ON
- The value is interpreted as a null value. This is the default.
- FALSE or OFF
- The value is interpreted as a character string.
- QUOTEDVALUE or STRING_DELIMITER
- Whether data values are enclosed in quotation marks:
- SINGLE or YES
- Data values are enclosed in single quotation marks (').
- DOUBLE
- Data values are enclosed in double quotation marks (").
- NO
- Data values are not enclosed in quotation marks. This is the default.
- RECORDDELIM or RECORD_DELIMITER
- The string literal that is to be interpreted as a row (record) delimiter. The default is
'\n'
. - REMOTESOURCE
- Where the external-table file resides and, if it resides on a remote system, whether the file
data is to be compressed:
- LOCAL
- The file resides on the local server, that is, the system that hosts the database. This is the default.
- YES
- The file resides on a system other than the local server. For example, specify YES if a client system is connected to the database and the file resides on that system. File data is not compressed before it is transferred.
- GZIP
- Similar to YES, except that the file data is compressed using the GZIP compression algorithm before the data is transferred, and is decompressed after it is received. This improves overall performance when a large amount of compressible data is being transferred.
- LZ4
- Similar to YES, except that the file data is compressed using the LZ4 compression algorithm before the data is transferred, and is decompressed after it is received. This improves overall performance when a large amount of compressible data is being transferred.
The REMOTESOURCE and PARTITION options are mutually exclusive. The COMPRESS option cannot be specified if the value of the REMOTESOURCE option is GZIP or LZ4.
- REQUIREQUOTES
- Whether quotation marks are mandatory:
- TRUE or ON
- Quotation marks are mandatory. The QUOTEDVALUE option must be set to YES, SINGLE, or DOUBLE.
- FALSE or OFF
- Quotation marks are not mandatory. This is the default.
- SKIPROWS or SKIP_ROWS
- For a load operation, the number of rows to skip before beginning to load the data. The default is 0. Because skipped rows are processed before they are skipped, a skipped row is still capable of causing a processing error.
- SOCKETBUFSIZE
- The size, in bytes, of the chunks of data that are read from the source file. Valid values range from 64 KB ‑ 800 MB. If you specify a value outside this range, the value is set to the nearest valid value. The default is 8 MB.
- STRICTNUMERIC
- For a load operation, how to treat a value that is to be inserted into a DECIMAL field when its
scale exceeds that defined for the field:
- TRUE or ON
- The row containing the value to be inserted is rejected. For example, if any of the following
values were to be loaded into a DECIMAL(5,3) field, the row containing that value would be
rejected:
12.666666666 -98.34496862785 0.00089
- FALSE or OFF
- The row containing the value to be inserted is accepted, and the portion of the decimal fraction
that exceeds the scale defined for the field is truncated. This is the default. For example, the
values in the previous example would be converted to:
12.666 -98.344 0.000
- SWIFT
- Specifies that the source data file is located in a Swift object store. Use
the DATAOBJECT option to specify the file name.Syntax:
where:SWIFT (endpoint, authKey1, authKey2, bucket)
- endpoint
- A character string that specifies the URL of the SWIFT web service.
- authKey1
- A character string that specifies the access ID or username of the Swift open stack account used to validate the user.
- authKey2
- A character string that specifies the password of the Swift open stack account used to validate the user.
- bucket
- The name of the Swift open stack container (bucket) in which the file resides.
Example:CREATE EXTERNAL TABLE exttab1(a int) using (dataobject 'datafile1.dat' swift('https://dal05.objectstorage.softlayer.net/auth/v1.0/', 'XXXOS123456-2:xxx123456', 'b207c6e974020737d92174esdf6d5be9382aa4c335945a14eaa9172c70f8df16', 'my_dev' ) )
- S3
- Specifies that the source data file is located in an S3 compatible object store. Use
the DATAOBJECT option to specify the file
name.Syntax:where:
S3 (endpoint, authKey1, authKey2, bucket)
- endpoint
- A character string that specifies the URL of the S3 compatible web service.
- authKey1
- A character string that specifies the S3 access key ID of the access keys used to validate the user and all user actions. For IBM Cloud Object Storage, this is the access key ID from the HMAC credentials.
- authKey2
- A character string that specifies the S3 secret key of the access keys that are used to validate the user and all user actions. For IBM Cloud Object Storage, this is the secret access key from the HMAC credentials.
- bucket
- The name of the S3 bucket in which the file resides.
Note: For IBM Cloud Object Storage, to create HMAC credentials, when creating new service credentials, specify{"HMAC:true}
in the Add Inline Configuration Parameters field.Example using AWS S3:CREATE EXTERNAL TABLE exttab2(a int) using (dataobject 'datafile2.dat' s3('s3.amazonaws.com', 'XXXOS123456-2:xxx123456', 'bs07c6e974040737d92174e5e96d5be9382aa4c33xxx5a14eaa9172c70f8df16', 'my_dev' ) )
Example using IBM Cloud Object Storage:CREATE EXTERNAL TABLE exttab2(a int) using (dataobject 'datafile2.dat' s3('s3-api.us-geo.objectstorage.softlayer.net', '1a2bkXXXsaddntLo0xX0', 'XXxxiEPjJ7T7WBUz74E6abcdABCDE8Q7RgU4gYY9', 'my_dev' ) )
- AZURE
- Specifies
that the source data file is located in Microsoft Azure Blob Storage. The REMOTESOURCE, SWIFT, S3,
and AZURE options are mutually exclusive. Use the DATAOBJECT option to specify the file name.
Syntax:Syntax:where:
AZURE (endpoint, authKey1, authKey2, bucket)
- endpoint
- A character string that specifies the URL of the AZURE web service.
- authKey1
- A character string that specifies the access ID or username of the Azure Blob Storage account used to validate the user.
- authKey2
- A character string that specifies the access key of the Azure Blob Storage account used to validate the user.
- bucket
- The name of the Azure Blob Storage container (bucket) in which the file resides.
Example:CREATE EXTERNAL TABLE exttab1(a int) using (dataobject 'datafile1.dat' azure('https://my_account.blob.core.windows.net', 'my_account', 'lW+oHjmZecPS++IKgThAHlMUOaFUA5C6Z2RlFmc9JPpK34RO/ZIOywzILxJnzGPHz6d/yDrcQDAwH5wySbOZMQ==', 'my_bucket' ) )
Example using IBM Cloud Object Storage:CREATE EXTERNAL TABLE exttab2(a int) using (dataobject 'datafile2.dat' s3('s3-api.us-geo.objectstorage.softlayer.net', '1a2bkXXXsaddntLo0xX0', 'XXxxiEPjJ7T7WBUz74E6abcdABCDE8Q7RgU4gYY9', 'my_dev' ) )
- TIMEDELIM
- The single-byte character that is to separate time components (hours, minutes, and seconds). The
default is
':'
. If TIMEDELIM is set to an empty string, hours, minutes, and seconds must all be specified as two-digit numbers. The TIMESTAMP_FORMAT and TIMEDELIM options are mutually exclusive. - TIMEROUNDNANOS or TIMEEXTRAZEROS
-
Note: This option applies only to TIMESTAMP columns.
- TIMESTYLE
- The time format that is to be used in the data file:
- 24HOUR
- 24-hour format, for example 23:55. This is the default.
- 12HOUR
- 12-hour format, for example 11:55 PM. An AM or PM token can be preceded by a single space and is not case-sensitive.
- TIMESTAMP_FORMAT
- The format of the timestamp field in the data file. The value can be any of the format strings that are accepted by the TIMESTAMP_FORMAT scalar function. The default is 'YYYY-MM-DD HH.MI.SS'. The TIMESTAMP_FORMAT option and the TIMEDELIM, DATEDELIM, TIMESTYLE, or DATESTYLE option are mutually exclusive.
- TIME_FORMAT
- The format of the time field in the data file. The value can be any of the time format strings that are accepted by the TIMESTAMP_FORMAT scalar function. The default is HH.MI.SS. The TIME_FORMAT option and a TIMEDELIM or TIMESTYLE option are mutually exclusive.
- TRIMBLANKS
- How an external table is to treat leading or trailing blanks (that is, leading or trailing space
characters) in a string:
- LEADING
- All leading blanks (that is, blanks that precede the first non-blank character) are removed.
- TRAILING
- All trailing blanks (that is, blanks that follow the last non-blank character) are removed.
- BOTH
- All leading and trailing blanks are removed.
- NONE
- No blanks are removed. This is the default.
- TRUNCSTRING or TRUNCATE_STRING
- How the system processes a CHAR or VARCHAR string that exceeds its declared storage size:
- TRUE
- The system truncates a string value that exceeds its declared storage size.
- FALSE
- The system returns an error when a string value exceeds its declared storage size. This is the default.
- Y2BASE
- The year that is the beginning of the 100-year range. Years that are specified as 2 digits are counted from this year. The default is 2000. This option must be specified when DATESTYLE is set to Y2MD, MDY2, DMY2, MONDY2 or DMONY2.
Table 3. Options Option Default Applies to Load Applies to Unload Azure (no default) Y Y BOOLSTYLE or BOOLEAN_STYLE 1_0 Y Y CARDINALITY (no default) Y Y CCSID (no default) Y Y COMPRESS NO Y Y CRINSTRING FALSE Y Y CTRLCHARS FALSE Y N DATAOBJECT or FILE_NAME (no default) Y Y DATEDELIM '-' Y Y DATETIMEDELIM A space (' ') Y Y DATESTYLE YMD Y Y DATE_FORMAT YYYY-MM-DD Y Y DECIMALDELIM or DECIMAL_CHARACTER '.' Y Y DELIMITER '|' Y Y ENCODING INTERNAL Y Y1 ESCAPECHAR or ESCAPE_CHARACTER (no default) Y Y FILLRECORD FALSE Y N FORMAT or FILE_FORMAT TEXT Y Y IGNOREZERO or TRIM_NULLS FALSE Y N INCLUDEHEADER or COLUMN_NAMES FALSE N Y INCLUDEZEROSECONDS FALSE Y Y INCLUDEHIDDEN FALSE Y N LFINSTRING FALSE Y N LOGDIR or ERROR_LOG target directory of external-table file Y N MULTIPARTSIZEMB value specified by the MULTIPARTSIZEMB dbm config parameter. Y N MAXERRORS or MAX_ERRORS 1 Y N MAXROWS or MAX_ROWS 0 Y N MERIDIANDELIM A space (' ') Y Y NOLOG FALSE Y Y NULLVALUE or NULL_VALUE 'NULL' Y Y PARTITION (no default) Y Y QUOTEDNULL TRUE Y N QUOTEDVALUE NO Y N RECORDDELIM or RECORD_DELIMITER '\n' Y N REMOTESOURCE LOCAL Y Y REQUIREQUOTES FALSE Y N SKIPROWS or SKIP_ROWS 0 Y N SOCKETBUFSIZE 8 MB Y Y STRICTNUMERIC FALSE Y N SWIFT (no default) Y Y S3 (no default) Y Y TIMEDELIM ':' Y Y TIMEROUNDNANOS or TIMEEXTRAZEROS FALSE Y N TIMESTAMP_FORMAT 'YYYY-MM-DD HH.MI.SS' Y Y TIMESTYLE 24HOUR Y Y TIME_FORMAT HH.MI.SS Y Y TRIMBLANKS NONE Y Y TRUNCSTRING or TRUNCATE_STRING FALSE Y N Y2BASE 2000 Y N 1 Only for the values INTERNAL, UTF8, and LATIN9. - AS SELECT STATEMENT
- Specifies that, for each column in the derived result table of the fullselect, a corresponding
column is to be defined for the table and populated with the results of the query. Each defined
column adopts the following attributes from its corresponding column of the result table (if
applicable to the data type):
- Column name
- Column description
- Data type, length, precision, and scale
- Nullability
Notes
- Records that cannot be processed (if any) are written to a file with a name of the
form:
Errors are logged in a file with a name of the form:<database>.<schema>.<external-table-name>.<file-name>.<application-handle>.<id>.bad
These files are located in the directory specified by the LOGDIR or ERROR_LOG option.<database>.<schema>.<external-table-name>.<file-name>.<application-handle>.<id>.log
For an operation on a partition, the name of the generated .bad or .log file is suffixed with a period followed by the 3-digit partition number.
- To create, insert into, or drop a named external table, issue a CREATE, INSERT, or DROP statement. You cannot insert into or drop a transient external table.
- Dropping an external table deletes the table definition but does not delete the data file that is associated with the table.
Restrictions
- Remote external table restrictions:
- It is not allowed within routines
- It is not allowed with the use of LOAD CURSOR
- Note: A single query or subquery cannot select from more than one external table at a time, and cannot reference the same external table more than once. If necessary, combine data from several external tables into a single table and use that table in the query.
In addition, a union operation cannot involve more than one external table.
- External tables can be queried only by a user ID defined within the operating system.
- Data being loaded must be properly formatted.
- You cannot delete, truncate, or update an external table.
Syntax alternatives
- SAMEAS can be used in place of LIKE.
- For the REMOTESOURCE option, the values ODBC, JDBC, or OLE-DB can be specified in place of YES.
- If the FORMAT option is set to INTERNAL, the value YES can be specified in place of GZIP for the COMPRESS option.
Examples
- Unload data to an external
table:
CREATE EXTERNAL TABLE 'order.tbl' USING (DELIMITER '|') AS SELECT * from orders;
CREATE EXTERNAL TABLE 'export.csv' USING (DELIMITER ',') AS SELECT foo.x, bar.y, bar.dt FROM foo, bar WHERE foo.x = bar.x;
- Load data from an external
table:
INSERT INTO target SELECT * FROM EXTERNAL 'data.txt' USING (DELIMITER '|');
INSERT INTO orders SELECT * FROM EXTERNAL 'order.tbl'( order_num INT, order_dt TIMESTAMP) USING (DELIMITER '|');
- Select data from an external
table:
SELECT * FROM EXTERNAL 'order.tbl' (order_num INT, order_dt TIMESTAMP) USING (DELIMITER '|');
SELECT * FROM EXTERNAL 'test.txt' LIKE test_table USING (DELIMITER ',');
SELECT x, y AS dt FROM EXTERNAL 'test.txt' ( x integer, y decimal(18,4) ) USING (DELIMITER ',');