Best practices and other techniques for using external tables

When using external tables, keep certain best practices and other techniques in mind.

Keep the following best practices in mind:
  • The system catalog data types of text and name are treated as nvarchar. If you use these types in the table that you reference in the select_clause, specify internal encoding by specifying the encoding option for the CREATE EXTERNAL TABLE command. Otherwise, you can receive the error LATIN9 encoding cannot be specified with NCHAR/NVARCHAR column definitions. The following example includes the encoding option:
    create external table '/tmp/ext1' using (encoding 'internal') 
    as select username from _t_user;
  • For a CREATE EXTERNAL TABLE command with the Encoding option set to 'internal', the USING clause is optional. If you omit it, the resulting external table has the default settings. You must specify the USING clause for the CREATE EXTERNAL TABLE SAMEAS command, because the SAMEAS table might be another external table.
  • Before you reload an external table, verify that the destination table in the database is empty or that it does not contain the rows in the external table that you are about to reload. If the destination table contains the rows that are in the external table, problems might occur. These problems can also occur if you accidentally reload the external table more than once.

    For example, loading a text-format external table into a destination table that contains the same data creates duplicate data in the database. The rows will have unique row IDs, but the data will be duplicated. To fix this problem, you must delete the duplicate rows or truncate the database table and reload the external table again (but only once).

    If you load a compressed binary format external table into a destination table that has the same rows, you create duplicate rows with duplicate row IDs in the database table. The system restores the rows by using the same row IDs that are saved in the compressed binary format file.

    Duplicate row IDs can cause incorrect query results and can lead to problems in the database. You can check for duplicate row IDs by using the rowid keyword as shown in the following example:
    SELECT rowid FROM employee_table GROUP BY rowid HAVING count(rowid)>1;
    

    If the query returns multiple rows that share the row ID, truncate the database table and reload the external table (but only once).

  • After you load data from an external table into a user table, update the statistics for the user table by running the GENERATE STATISTICS command. This improves the performance of queries that run against that table.
Keep the following other techniques in mind:
  • You can use an external table reference as the source table of a SELECT FROM statement. A transient external table reference in a SELECT FROM statement infers its shape from the preceding INSERT INTO clause.
  • The CREATE EXTERNAL TABLE AS command supports an optional table name. If you do not provide a table name, the table is transient, which means the external table definition does not persist in the system catalog. If you supply a table name, the external table becomes a named object in the system catalog.