- Understanding external table concepts
- Creating external tables
- Unloading from a base table
- Loading into the base table
- Selecting from external tables
- Use cases
- MACH-11 environment
- System catalogs
- Guidelines and best practices
- Downloadable resources
- Related topics
Using data file abstraction with external tables in Informix Dynamic Server
High performance ETL using a simpler SQL interface
Informix Dynamic Server (IDS) 11.50.xC6 allows you to create a table where the data is external to the database server. These are called external tables. The external table feature is already supported in Informix eXtended Parallel Server and is heavily used by the XPS customers to load and unload their massive data sets. Now, IDS also supports this feature to allow you to easily perform your data load and unload operations with highest performance and flexibility.
IDS already has tools for loading and unloading of data. These include the
dbaccess commands (load and unload), High
Performance Loader (HPL), and
tools are used for moving data from one IDS instance to another and to
keep a backup of the data as flat files. For the most part, they are all
client-based; thus there is a communications cost involved when loading
and unloading the data. (HPL is the exception, as it has a counterpart in
the server to do the job.)
Considering the data growth in the last decade, OLTP servers are also handling a massive amount of data that demands an optimal tool for performing extract, transform, and load (ETL) operations. The introduction of external tables within the IDS server fulfils the requirement and provides better performance.
Understanding external table concepts
Typically, data accessed by SQL is stored and managed completely by IDS. By extensibility, IDS also supports the Virtual Table Interface (VTI) and Virtual Index Interface (VII) that gives control to users to seamlessly access their own format of data.
External tables provide a normal table interface to data files that reside in the underlying operating system. These files can be in some specific form (discussed later in this article) that is understood by the IDS so that it can be accessed using SQL queries.
Creating external tables
An external table can be created using one of the following three methods:
- Using the
CREATE EXTERNAL TABLEstatement with column specification (see Listing 1)
- Using the
CREATE EXTERNAL TABLEwith a
SAMEASclause (see Listing 2)
- Using a
SELECTstatement to select from a data source into an external table (see Listing 3)
Listing 1 illustrates the first method:
Listing 1. Explicitly specifying all the columns
CREATE EXTERNAL TABLE et_state (code CHAR(2),sname CHAR(15)) USING (DATAFILES ("DISK:/data/NA_states.unl", "DISK:/data/SA_states.unl"));
Listing 2 shows the second method—copying from a template table. For this method, note the following:
- The constraints defined on the columns of the template table are not copied
- External table "et_state" does not keep the reference to the template table "state"; meaning that, when the state table is later modified, the external table is not affected
- EXTERNAL column specification of the template table is not copied, hence FIXED format cannot be used with this type, as this format mandates the presence of EXTERNAL column specification (EXTERNAL column specification is discussed below)
Listing 2. Copying columns and their types from the SAMEAS template table
CREATE EXTERNAL TABLE et_state SAMEAS state USING (DATAFILES ("DISK:/data/states.unl"));
Listing 3 illustrates the third method. For this method, note the following:
- If the projection column list contains an expression, it must have a valid column alias
- FIXED format cannot be used because this format mandates the presence of EXTERNAL column specification, which is not possible with this type (EXTERNAL column specification is discussed below)
Listing 3. Deriving columns and their types from the projected columns
SELECT * FROM state INTO EXTERNAL et_state USING (DATAFILES ("DISK:/data/states.unl"));
Supported data types
External tables can be created with the columns of all the data types supported by the regular base table in IDS. When creating an external table with BLOB and CLOB data types, it is recommended that you provide the BLOBDIR and CLOBDIR where the blob data or clob data are extracted with the dynamically generated file name. When these entries are not specified, the files that contain BLOB or CLOB data will be extracted into the corresponding directory where the disk file is present.
Data file specification
Creation of external tables also includes a
USING clause in addition to the column
specification. This clause is used to specify the datafile(s) and table
options that will be used for the table.
DATAFILES is a mandatory clause for each
external table. It specifies the file path where the data can be read from
or written to. The path specified here can be of type DISK or of type
PIPE. Also, each entry can optionally have a BLOBDIR and CLOBDIR
specification where in the BLOB and CLOB data are read or written. There
can be multiple files specified in this clause.
DISK files are accessed using AIO virtual processors (VP), whereas the PIPE
files are accessed by the FIFO VPs. DISK files are normal text or binary
files, whereas PIPE files are FIFO files to be created using
UNIX® as follows:
% mkfifo <filename>
% mknod <filename> p
filenameis the name of the pipe file
pis the type of the file to represent PIPE
- FORMAT: The table can have any of DELIMITED, FIXED, or INFORMIX
formats. When FORMAT is not specified, DELIMITED is assumed by default.
- DELIMITED: The data files specified in the DATAFILES section are delimited. A delimiter character can be specified in the DELIMITER table option.
- FIXED: The FIXED format must be used only when the EXTERNAL clause is used in the external table column specification while creating the external table. (See Listing 18).
- INFORMIX: If the data format is specified as INFORMIX, the data read (or written) from (or to) the file is in the binary format that can be decoded only by Informix.
- DELUXE, EXPRESS, and DEFAULT: While loading the data, any of these options can be specified to perform the load in the respective mode. DELUXE mode provides more concurrency to the target table while loading, whereas EXPRESS mode expects the target table to be a RAW table. DEFAULT option leaves the decision to the IDS server.
- ESCAPE: If the delimiter characters are used as part of the data value, then the ESCAPE option will escape those delimiters from the data so that they won't be misinterpreted as the delimiters.
- DBDATE: The date column values in the external data file are in
the format specified by this DBDATE. When this is not specified, the
database server uses the date format specified in the
DBDATEenvironment variable. When neither of them are specified, the database server's default date format is assumed.
- DBMONEY: The currency symbol for the money column values in the
external data file are in the format specified by the DBMONEY. When
this is not specified, the database server uses the money format
specified in the
DBMONEYenvironment variable. When neither of them is specified, the database server's default money format is assumed.
- DELIMITER: Specifies the character that separates fields in a
delimited text file. Default delimiter is a vertical bar
- RECORDEND: Specifies the character that separates records in a delimited text file. Default is the end of the line character. Note that this implicit new line of a data file is not interpreted as a character "\n". If the RECORDEND is specified as "\n," the same character "\n" is expected to be in the data file after each record. Note that both DELIMITER and RECORDEND does not apply to the FIXED and INFORMIX formats.
- MAXERRORS: Sets the number of errors that are allowed before the database server stops the load.
- REJECTFILE: Sets the full pathname where IDS writes rejected
rows due to errors, such as a data-conversion error. If not specified
or if files cannot be opened, any load error abnormally ends the
operation. The following errors could cause a row to be rejected:
- CONVERT_ERR - Any field encounters a conversion error
- MISSING_DELIMITER - No delimiter was found
- MISSING_RECORDEND - No record end was found
- NOT NULL - A NULL was found in field name
- ROW_TOO_LONG - The input record is too long
- NUMROWS: The approximate number of rows that the external table can possibly have. This can improve the performance when external table is used in the join query. In order to be compatible with XPS syntax, SIZE can also be used in the place of NUMROWS.
Creating external table: An example
To create an external table "et_customer" with the same schema as customer table with a delimiter "|", you can use the following DDL statement:
Listing 4. Create the external table
CREATE EXTERNAL TABLE et_customer SAMEAS customer USING (DATAFILES ("DISK:/data/americascust.dat"),DELIMITER "|");
Unloading from a base table
To unload data from a base table (for example, "customer") into an external table that abstracts the data file /data/americascust.dat using the table definition above, you would use the following command:
Listing 5. Unload data that abstracts data file
INSERT INTO et_customer SELECT * FROM customer WHERE customer_num <= 110;
The unloaded file /data/americascust.dat will contain:
Listing 6. Contents of unloaded data file
101|Ludwig|Pauli|All Sports Supplies|213 Erstwild Court||Sunnyvale|CA|94086|408-789-8075| 102|Carole|Sadler|Sports Spot|785 Geary St||San Francisco|CA|94117|415-822-1289| 103|Philip|Currie|Phil's Sports|654 Poplar|P. O. Box 3498|Palo Alto|CA|94303|415-328-4543| 104|Anthony|Higgins|Play Ball!|East Shopping Cntr.|422 Bay Road|Redwood City|CA|94026|415-368-1100| 105|Raymond|Vector|Los Altos Sports|1899 La Loma Drive||Los Altos|CA|94022|415-776-3249| 106|George|Watson|Watson & Son|1143 Carver Place||Mountain View|CA|94063|415-389-8789| 107|Charles|Ream|Athletic Supplies|41 Jordan Avenue||Palo Alto|CA|94304|415-356-9876| 108|Donald|Quinn|Quinn's Sports|587 Alvarado||Redwood City|CA|94063|415-544-8729| 109|Jane|Miller|Sport Stuff|Mayfair Mart|7345 Ross Blvd.|Sunnyvale|CA|94086|408-723-8789| 110|Roy|Jaeger|AA Athletics|520 Topaz Way||Redwood City|CA|94062|415-743-3611|
Loading into the base table
To load data from the data file /data/americascust.dat (created above)
into the base table customer, use the
statement as follows:
INSERT INTO customer SELECT * FROM et_americascust;
Selecting from external tables
Selecting from an external table reads the data from the data file(s) and
provides it in the form of rows and columns as if it was read from the
normal base table. You can also perform operations such as filtering,
and other operations that would be possible with a regular base table.
Refer the Restrictions section below for more
Listing 7 provides an example query to list all the customer names and phone numbers from state of New Jersey:
Listing 7. List all NJ customer names and phone numbers
SELECT RTRIM(fname) || ' ' || lname as fullname, phone FROM et_americascust WHERE state = 'NJ'; fullname phone Bob Shorter 609-663-6079 Cathy O'Brian 609-342-0054
Case 1: Using the ampersand (&) as a delimiter where a few rows contain & as a data value
- Create an external table using the following DDL:
Listing 8. Create the external table
CREATE EXTERNAL TABLE et_customer SAMEAS customer USING (DATAFILES ("DISK:/data/americascust.dat"), DELIMITER "&", MAXERRORS 1, REJECTFILE "/tmp/reject.out");
- Unload the data from the customer base table using the query in
Listing 9. Unload the data from customer base table
INSERT INTO et_customer SELECT * FROM customer WHERE customer_num <= 110; 10 row(s) inserted.
Listing 10. Contents of the disk file /data/americascust.dat
101&LudwigPauliAll Sports Supplies&213 Erstwild Court&&Sunnyvale&CA&94086&408-789-8075& 102&Carole&Sadler&Sports Spot&785 Geary St&&San Francisco&CA&94117&415-822-1289& 103&Philip&Currie&Phil's Sports&654 Poplar&P. O. Box 3498&Palo Alto&CA&94303&415-328-4543& 104&Anthony&Higgins&Play Ball!&East Shopping Cntr.&422 Bay Road&Redwood City&CA&94026&415-368-1100& 105&Raymond&Vector&Los Altos Sports&1899 La Loma Drive&&Los Altos&CA&94022&415-776-3249& 106&George&Watson&Watson & Son&1143 Carver Place&&Mountain View&CA&94063&415-389-8789& 107&Charles&Ream&Athletic Supplies&41 Jordan Avenue&&Palo Alto&CA&94304&415-356-9876& 108&Donald&Quinn&Quinn's Sports&587 Alvarado&&Redwood City&CA&94063&415-544-8729& 109&Jane&Miller&Sport Stuff&Mayfair Mart&7345 Ross Blvd.&Sunnyvale&CA&94086&408-723-8789& 110&Roy&Jaeger&AA Athletics&520 Topaz Way&&Redwood City&CA&94062&415-743-3611&
- Select the rows from the external table using the query in Listing 11:
Listing 11. Select the rows from the external table
SELECT customer_num, rtrim(fname) || " " || lname as fullname, phone FROM et_customer WHERE customer_num <= 110; customer_num fullname phone 101 Ludwig Pauli 408-789-8075 102 Carole Sadler 415-822-1289 103 Philip Currie 415-328-4543 104 Anthony Higgins 415-368-1100 105 Raymond Vector 415-776-3249 26197: Reached maximum error limit during load: (1). Error in line 5 Near character position 23
There are few things to be noted in Listing 11:
- The row with customer_num 106 is not displayed and the query aborted because this row contains data "Watson & son," where the delimiter "&" is part of the data.
- Error 26197 has been generated. Since
MAXERRORSis specified as 1, the first error where customer_num is 106 itself was not "tolerated" and hence aborted. If
MAXERRORSis set to 2 or more, it would have displayed the rest of the five rows as well from et_customer.
- There is a missing row contained in the reject file. The reason for the error is because the "&" that was declared as a delimiter was also part of the data and ended up having too many columns for a row with customer_num = 106.
Case 2: Escape the data that is used as a delimiter
USING clause provides an
ESCAPE option to overcome the issue described
in Case 1.
Create the external table using the following query:
Listing 12. Create the external table
CREATE EXTERNAL TABLE et_customer SAMEAS customer USING (DATAFILES ("DISK:/data/americascust.dat"), DELIMITER "&", ESCAPE, MAXERRORS 2, REJECTFILE "/tmp/reject.out");
The delimited and escaped data is stored in the data file as follows:
Listing 13. Delimited and escaped data
... 106&George&Watson&Watson \& Son&1143 Carver Place&&Mountain View&CA&94063&415-389-8789& ...
Note that the "&" used as part of the data has been escaped using a backslash (\). This row can now be successfully accessed through an external table.
Case 3: Use
NUMROWS to get more realistic estimates and
NUMROWS is used to provide an approximate number
of rows that an external table can have. This information is used by the
optimizer to come up with an optimal plan when using the external table in
- Create an external table without
Listing 14. Create external table without
CREATE EXTERNAL TABLE et_customer SAMEAS customer USING (DATAFILES ("DISK:/tmp/americascust.dat"), DELIMITER "|"); INSERT INTO et_customer SELECT * FROM customer; SET EXPLAIN ON; SELECT b.customer_num, a.fname, a.lname FROM et_customer a, orders b WHERE a.customer_num = b.customer_num and b.order_num lt; 1005 sqexplain.out QUERY: (OPTIMIZATION TIMESTAMP: 01-15-2010 01:38:53) ------ select b.customer_num, a.fname, a.lname from et_customer a, orders b where a.customer_num = b.customer_num and b.order_num lt; 1005 Estimated Cost: 609062016 Estimated # of Rows Returned: 631612864 1) informix.a: SEQUENTIAL SCAN 2) informix.b: INDEX PATH (1) Index Name: informix. 102_3 Index Keys: order_num (Serial, fragments: ALL) Upper Index Filter: informix.b.order_num lt; 1005 DYNAMIC HASH JOIN Dynamic Hash Filters: informix.a.customer_num = informix.b.customer_num
- To compare, create an external table with
Listing 15. Create external table with NUMROWS
CREATE EXTERNAL TABLE et_customer SAMEAS customer USING (DATAFILES ("DISK:/tmp/americascust.dat"), DELIMITER "|", NUMROWS 28); INSERT INTO et_customer SELECT * FROM customer; SET EXPLAIN ON; SELECT b.customer_num, a.fname, a.lname FROM et_customer a, orders b WHERE a.customer_num = b.customer_num AND b.order_num lt; 1005 sqexplain.out QUERY: (OPTIMIZATION TIMESTAMP: 01-15-2010 01:37:11) ------ select b.customer_num, a.fname, a.lname from et_customer a, orders b where a.customer_num = b.customer_num and b.order_num lt; 1005 Estimated Cost: 10 Estimated # of Rows Returned: 8 1) informix.a: SEQUENTIAL SCAN 2) informix.b: INDEX PATH (1) Index Name: informix. 102_3 Index Keys: order_num (Serial, fragments: ALL) Upper Index Filter: informix.b.order_num lt; 1005
Although both the plans are the same in this simplified example, note that
the estimated rows and estimated cost have drastic differences, and that
the one with
NUMROWS defined yields more
realistic estimates. When multiple tables are involved, these estimates
play a crucial role in getting better performance.
Case 4: Having EXPRESS mode without RAW table, and showing a message log switch
Consider the EXPRESS mode is used within the
USING clause. While loading the data into the
base table using an external table when the target base table is a non-RAW
table, IDS internally switches the loading mode to DELUXE. This switch is
logged into the online log message file, as illustrated in Listing 16:
Listing 16. Message in online log
... 12:20:53 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 30, Llog used 7 12:22:49 Switching load on target table informix.customer1 to DELUXE ...
The EXPRESS or DELUXE modes specified are not enforced in all cases. It depends on the type and a few other properties of the target table while performing the loading operation. So, when EXPRESS mode is specified and it is internally switched to DELUXE, such information is logged in the message log, as is the case if the reverse happened.
Case 5: Having stored procedure using dynamic SQL and external tables
Listing 17. Creating external table in stored procedure using dynamic SQL
CREATE PROCEDURE et_proc(tabname CHAR(128), et_tabname CHAR(128)) DEFINE qry CHAR(256); LET qry = "CREATE EXTERNAL TABLE" || et_tabname || "(code CHAR(2), name CHAR(15)) USING (DATAFILES ("DISK:/data/tabdata.unl"))"; EXECUTE IMMEDIATE qry; LET qry = "INSERT INTO" || et_tabname || "SELECT * FROM" || tabname; EXECUTE IMMEDIATE qry; LET qry = "DROP TABLE" || et_tabname; EXECUTE IMMEDIATE qry; END PROCEDURE; EXECUTE PROCEDURE et_proc;
External table statements can also be used as static statements. Also, the selection from the external table can be prepared and executed. All the restrictions discussed below are applicable within the stored procedural language.
Support of external table creation, access, load, and unload from within
the stored procedures makes it easier to perform the batch load and unload
operations. You can use the
dbcrons feature to
schedule load and unload operations by calling stored procedures.
Case 6: Using FIXED format
When using FIXED format, it is mandatory to use EXTERNAL keyword for the columns.
Listing 18. Creating external table with FIXED format
CREATE EXTERNAL TABLE et_customer ( customer_num INT EXTERNAL CHAR(4), fname CHAR(40) EXTERNAL CHAR(40)) USING (DATAFILES ("DISK:/data/cust.unl"), FORMAT "FIXED", REJECTFILE "/tmp/reject.out");
Case 7: Using user-defined types (UDT)
The example code in Listing 19 creates a distinct user-defined type called "birthday" and is being used as the column type while creating the external table. Similarly, other UDTs, like opaque or complex, can also be used with the external tables.
Listing 19. Creating external table using UDT
CREATE DISTINCT TYPE birthday AS DATE; CREATE EXTERNAL TABLE et_employee (empno CHAR(10), empname VARCHAR(40), empdob birthday) USING (DATAFILES("DISK:/data/exployee.dat"), REJECTFILE "/tmp/reject.out");
Be aware of the following restrictions on the use of external tables:
- You cannot have indexes (therefore, you also cannot have primary, unique, or foreign keys)
- You cannot define triggers for external tables
- You cannot alter the external table schema
- An external table cannot be a target table in the
- You cannot use multiple external tables in the main query
- You cannot use an external table in a sub-query
- External table data files are excluded from all forms of replication
- You cannot run
UPDATE STATISTICSagainst an external table
- You cannot execute
- LBAC security cannot be imposed on external table data files
- The external data files cannot be compressed
- START/STOP VIOLATIONS are restricted
- You cannot insert into an external table using the Dbaccess command
LOAD FROM ... INSERT INTO
No transaction support
Operations on external tables are not logged. Thus transaction support is
not enabled on external tables. That is, the
BEGIN WORK and
COMMIT/ROLLBACK WORK statements are ignored for
external table operations.
Because they are not logged, the external table data files are excluded from all the replications. However, the system catalog data that was added at the time of creation of an external table will still be replicated.
Primary: All supported operations on external tables are allowed on the primary server.
Secondary: An external table works similarly on the secondary servers, except that on secondary, the external table cannot be created as there are no DDLs allowed on the secondary in the MACH-11 environment. Loading and unloading of data can be done from the secondary server. However, it should be noted that the performance of doing the loading of data from an external table into the base table on secondary might be less compared to doing it directly on the primary. This is because the updates on the secondary server are forwarded to the primary to perform the actual operation. The data file created in the primary or secondary servers are not replicated to the other servers. It is the user's responsibility to copy these files to the other servers as required.
The permissions used to access external tables is governed by the user
access privileges on the external table by IDS as well as the data file
permissions managed by the operating system. This means that the table
creator will have access to the external table and must also have
appropriate access to the data file(s) associated with the external table.
Note that at the time of creating an external table, the data files
associated with the external table need not exist. Hence it is the table
creator's responsibility to make sure that the file exists with
appropriate permission in order to perform select or insert operations on
the external tables. Any other user must have access privilege on the
table as well as on the data file to work with the external table. Also,
REVOKE statements, it is possible to control
the access of the external table from within the database server in spite
of having all the privileges to the underlying file(s). Users can GRANT
and REVOKE the SELECT or INSERT privileges on the external table. Granting
or revoking the privileges on an external table does not have any effect
on the associated data file(s) permissions.
For every database, there will be three additional new system catalogs created in IDS 11.50.xC6 (or later):
These catalog tables are needed to store the table options, data files, and external columns discussed in this article.
External tables are part of the IDS server and use algorithms to perform the loading and unloading operations in parallel. Therefore, you can expect performance to be better compared to other client-based tools.
For improved performance, the load operation uses light appends when applicable. If the base table to which the data is loaded from the external table is defined to be a RAW table without indexes, the IDS server uses light appends to load the data.
The tables that are light-appended must be backed up before any operations can be performed on the base table. This behaviour is similar to high performance loader when it uses the light appends to load the data.
When there are multiple files defined along with the external tables, and when the PDQ is ON, all the files are accessed in parallel to feed into the loader. This improves the performance.
Due to the addition of new system catalog tables for external tables, a prior version of the IDS server (11.50.xC5 or prior) must undergo an automatic conversion process just by bringing up the IDS 11.50.xC6 server (or later) without reinitializing using the prior "rootdbs" (look for ROOTNAME entry in your onconfig file). This "rootdbs" can belong to any prior server, from IDS 9.4 or later, so that the automatic conversion can handle it. IDS 11.50.xC6 (or later) will appropriately convert all the databases to include these new system catalog tables. The data files associated with the external tables will not undergo any changes.
Similarly, the reversion process must be followed to appropriately revert back from IDS 11.50.xC6 (or later) to any prior versions. After dropping all the external tables, run the following command on IDS 11.50.xC6 server to perform the steps to revert to the version 11.50.xC5:
% onmode b 11.50.xC5
Note that the new system catalog tables will be dropped during the reversion process.
Guidelines and best practices
For the best results with external tables, follow these practices:
- Unload the data on a secondary server, preferably a read-only secondary
- Load the data on primary during non-peak hours
- Use the
SAMEASclause in schema creation scripts
- Have an explicit delimiter and escape options
- Define external table with multiple files instead of one huge single file; when executing load or select operation with PDQ priority set, IDS exploits the parallelism
- While loading, if the target base table is fragmented, the load happens in parallel under the PDQ environment
- Pipes in external tables can be used to transfer data between two instances and save on the disk space
External tables provide an easy and intuitive way to perform the unload and load operations through the SQL interface or stored procedures. Applying table abstraction to the underlying data file makes it easier to seamlessly apply SQL operations. Various formats, delimiters, and date and money formats are accommodated so the data files may come from any data source and any geographic location.
- Informix page on developerWorks: Learn more about Informix.
- IBM Informix Dynamic Server 11.5 Information Center: Learn more about Informix.
- Informix Dynamic Server Enterprise and Developer Edition: Download a free trial version of IDS Server Enterprise and Developer Edition.
- Informix Dynamic Server Express Edition: Download a trial version of IDS Express Edition to get started with IDS.