INSERT (DATALAKE) statement

The INSERT (DATALAKE) statement inserts rows into a Datalake table.

Invocation

This statement can be embedded in an application program or issued using dynamic SQL statements. It is an executable statement that can be dynamically prepared.

Authorization

The privileges held by the authorization ID of the statement must include at least one of the following authorities:

INSERT privilege on the target table
CONTROL privilege on the target table
DATAACCESS authority

In addition, for each table, view, or nickname referenced in any fullselect used in the INSERT statement, the privileges held by the authorization ID of the statement must include at least one of the following authorities:

SELECT privilege
CONTROL privilege
DATAACCESS authority

GROUP privileges are not checked for static INSERT statements.

Syntax

Description

INTO table-name

Identifies the target table of the insert operation. The table-name must identify a Datalake table.

column-name

Specifies the columns for which insert values are provided. Each name must identify a column of the specified table. The same column must not be identified more than once. If extended indicator variables are not enabled, a column that cannot accept inserted values (for example, a column based on an expression) must not be identified.

Omission of the column list is an implicit specification of a list in which every column of the table (that is not implicitly hidden) or view, or every item in the select-list of the fullselect is identified in left-to-right order.. This list is established when the statement is prepared and, therefore, does not include columns that were added to a table after the statement was prepared.

VALUES

Introduces one or more rows of values to be inserted.

Each row specified in the VALUES clause must be assignable to the implicit or explicit column list and the columns identified in the INCLUDE clause, unless a row variable is used. When a row value list in parentheses is specified, the first value is inserted into the first column in the list, the second value into the second column, and so on. When a row expression is specified, the number of fields in the row type must match the number of names in the implicit or explicit column list.

It is not recommended to use the INSERT ... VALUES operation on Datalake tables to load large quantities of data into a table. This operation cannot be run in parallel. In addition, each invocation produces a data file, which negatively impacts both INSERT performance as well as SELECT performance.

expressionAn expression can be any expression defined in the “Expressions” topic. If expression is a row type, it must not appear in parentheses. If expression is a variable, the host variable can include an indicator variable or in the case of a host structure, an indicator array, enabled for extended indicator variables. If extended indicator variables are enabled, the extended indicator variable values of default (-5) or unassigned (-7) must not be used (SQLSTATE 22539) if either of the following statements is true:

The expression is more complex than a single host variable with explicit casts.
The target column has data type of structured type.

NULL

Specifies the null value and should only be specified for nullable columns.

DEFAULT

Specifies that the default value is to be used. The result of specifying DEFAULT depends on how the column was defined, as follows:

If the column was defined as a generated column based on an expression, the column value is generated by the system, based on that expression. For tables that were created by running the CREATE DATALAKE TABLE statement specifying the PARTITIONED BY partition-expression clause for a Hive Datalake table, DEFAULT automatically generates the column values for partition-expression.
If the IDENTITY clause is used, the value is generated by the database manager.
If the ROW CHANGE TIMESTAMP clause is used, the value for each inserted row is generated by the database manager as a timestamp that is unique for the table partition within the database partition.

row-expression Specifies any row expression of the type described in "Row expressions" that does not include a column name. The number of fields in the row must match the target of the insert and each field must be assignable to the corresponding column.

WITH common-table-expression

Defines a common table expression for use with the fullselect that follows.

fullselectSpecifies a set of new rows in the form of the result table of a fullselect. There may be one, more than one, or none. If the result table is empty, SQLCODE is set to +100 and SQLSTATE is set to '02000'.

When the base object of the INSERT and the base object of the fullselect, or any subquery of the fullselect, are the same table, the fullselect is completely evaluated before any rows are inserted.

The number of columns in the result table must equal the number of names in the column list. The value of the first column of the result is inserted in the first column in the list, the second value in the second column, and so on.

If the expression that specifies the value of a result column is a variable, the host variable can include an indicator variable enabled for extended indicator variables. If extended indicator variables are enabled, and the expression is more than a single host variable, or a host variable being explicitly cast, then the extended indicator variable values of default or unassigned must not be used (SQLSTATE 22539). The effects of default or unassigned values apply to the corresponding target columns of the fullselect.

Rules

Length: If the insert value of a column is a number, the column must be a numeric column with the capacity to represent the integral part of the number. If the insert value of a column is a string, the column must either be a string column with a length attribute at least as great as the length of the string, or a datetime column if the string represents a date, time, or timestamp.
Assignment: Insert values are assigned to columns in accordance with specific assignment rules.
Referential integrity: For each constraint defined on a table, each non-null insert value of the foreign key must be equal to a primary key value of the parent table.
Check constraint: Insert values must satisfy the check conditions of the check constraints defined on the table. An INSERT to a table with check constraints defined has the constraint conditions evaluated once for each row that is inserted.
Security policy: If the identified table or the base table of the identified view is protected with a security policy, the session authorization ID must have the label-based access control (LBAC) credentials that allow:
- Write access to all protected columns for which a data value is explicitly provided (SQLSTATE 42512).
- Write access for any explicit value provided for a DB2SECURITYLABEL column for security policies that were created with the RESTRICT NOT AUTHORIZED WRITE SECURITY LABEL option (SQLSTATE 23523).
The session authorization ID must also have been granted a security label for write access for the security policy if an implicit value is used for a DB2SECURITYLABEL column (SQLSTATE 23523), which can happen when:
- A value for the DB2SECURITYLABEL column is not explicitly provided.
- A value for the DB2SECURITYLABEL column is explicitly provided but the session authorization ID does not have write access for that value, and the security policy is created with the OVERRIDE NOT AUTHORIZED WRITE SECURITY LABEL option.
Extended indicator variable usage: If enabled, negative indicator variable values outside the range of -1 through -7 must not be input (SQLSTATE 22010). Also, if enabled, the default and unassigned extended indicator variable values must not appear in contexts in which they are not supported (SQLSTATE 22539).
Extended indicator variables: In an INSERT statement, a value of unassigned has the effect of setting the column to its default value. If the target column is a column defined as GENERATED ALWAYS, then it must be assigned the DEFAULT keyword, or the extended indicator variable-based values of default or unassigned (SQLSTATE 428C9).

Notes

After execution of an INSERT statement, the value of the third variable of the SQLERRD(3) portion of the SQLCA indicates the number of rows that were passed to the insert operation. In the context of an SQL procedure statement, the value can be retrieved using the ROW_COUNT variable of the GET DIAGNOSTICS statement. SQLERRD(5) contains the count of all triggered insert, update and delete operations.
Considerations for an INSERT without a column list: An INSERT statement without a column list does not include implicitly hidden columns. Columns that are defined as implicitly hidden and not null must have a defined default value.
INSERT operations on a Datalake table are non-transactional: If the statement succeeds, the INSERT is completely successful and cannot be rolled back by a ROLLBACK or ROLLBACK TO SAVEPOINT statement. If the statement fails no rows are inserted into the table.
INSERT operations on a Datalake tables table are not logged to the transaction log: Rolling forward past a transaction that included an INSERT operation on a Datalake table does not replay the INSERT operation.

Examples

Example 1: Insert a new department with the following specifications into the DEPARTMENT table:
- Department number (DEPTNO) is 'E31'
- Department name (DEPTNAME) is 'ARCHITECTURE'
- Managed by (MGRNO) a person with number '00390'
- Reports to (ADMRDEPT) department 'E01'.
```
INSERT INTO DEPARTMENT 
     VALUES ('E31', 'ARCHITECTURE', '00390', 'E01') 
```

Example 2: Insert a new department into the DEPARTMENT table as in example 1, but do not assign a manager to the new department.
```
INSERT INTO DEPARTMENT (DEPTNO, DEPTNAME, ADMRDEPT ) 
     VALUES ('E31', 'ARCHITECTURE', 'E01') 
```

Example 3: Assume that the Hive Datalake table FACTS was created by running the following statement:

CREATE DATALAKE TABLE FACTS ( 

    TRANS_ID INT,
    PRODUCT VARCHAR(50), 
    TRANS_DATE TIMESTAMP ) 
  PARTITIONED BY ( 
    YEAR(TRANS_DATE) AS YEAR_PART, 
    MONTH(TRANS_DATE) AS MONTH_PART ) 
  STORED AS PARQUET  
  LOCATION 'DB2REMOTE://odfdefault//defaultbucket/facts'  
  TBLPROPERTIES ('external.table.purge'='true');

Insert some data into the FACTS table, as shown in the following example:

INSERT INTO FACTS VALUES (1000, 'birthday card', '2016-06-12', DEFAULT, DEFAULT);

Run the following query against the table:

SELECT * FROM FACTS WHERE trans_id = 1000;

The query returns the following result set, in which the YEAR_PART and MONTH_PART column values are automatically generated:

TRANS_ID    PRODUCT                                            TRANS_DATE                    YEAR_PART   MONTH_PART 

----------- -------------------------------------------------- ----------------------------- ----------- ----------- 

            1000 birthday card                                 2016-06-12-00.00.00.000000000 2016        6