Specifying the file code page for Unicode support of table-structured file data sources
To ensure correct code page conversion for table-structured file data sources data sources, you must specify the file code page if the code page differs from the federated database code page.
About this task
Valid values are those that Db2® supports. The default value is the code page of the federated database.
Restrictions
You can use the CODEPAGE option only in a Unicode federated database. If the federated database code page is UTF-8 which is different from the table-structured file character set, specify the CODEPAGE option to the table file character set.
Procedure
To specify the code page of a table-structured file, issue the CREATE NICKNAME statement with the CODEPAGE option set to the code page number of the data in the table-structured file.
CREATE NICKNAME DRUGDATA1(Dcode Integer NOT NULL, Drug CHAR(20),
Manufacutuer CHAR(20))
FOR SERVER biochem_lab
OPTIONS(FILE_PATH '/usr/pat/DRUGDATA1.TXT',CODEPAGE '943',
COLUMN_DELIMITER '.',
SORTED 'Y', KEY_COLUMN 'DCODE', VALIDATE_DATA_FILE 'Y');
Example:To specify the CODEPAGE option, first check the table structure file character
set. An error is reported if the CODEPAGE option is not set correctly. Take the file
test_data.txt
as an example, the charset
of
test_data.txt
is UTF-8.
File: test_data.txt
Content: ã,ABCDf
$ file -i test_data.txt
test_data.txt: text/plain; charset=utf-8
$ xxd test_data.txt
0000000: c3a3 2c41 4243 4466 0a ..,ABCDf.
CREATE NICKNAME Test (Non_Ascii CHAR(10), Ascii VARCHAR(10)) FOR SERVER Sample OPTIONS (FILE_PATH '/home/db2inst1/test_data.txt', COLUMN_DELIMITER ',', SORTED 'N', CODEPAGE '1208')
DB20000I The SQL command completed successfully.
select * from Test
NON_ASCII ASCII
---------- ----------
ã ABCDf
1 record(s) selected.
select hex(Non_Ascii), hex(Ascii) from Test
1 2
---------- ----------
C3A32020202020202020 4142434466
1 record(s) selected.
Generate the file test_data_88591.txt
by converting charset
to
iso-8859-1, the hex value of ã becomes 0xe3
. Query from nickname with
CODEPAGE '1208' gets Invalid column separator. Set CODEPAGE option to
the same as the file charset
.
$ iconv test_data.txt -f UTF-8 -t iso-8859-1 -o test_data_88591.txt --verbose
test_data.txt:
$ file -i test_data_88591.txt
test_data_88591.txt: text/plain; charset=iso-8859-1
$ xxd test_data_88591.txt
0000000: e32c 4142 4344 660a .,ABCDf.
CREATE NICKNAME Test(Non_Ascii CHAR(50), Ascii VARCHAR(10)) FOR SERVER Sample OPTIONS (FILE_PATH '/home/db2inst1/test_data_88591.txt',COLUMN_DELIMITER ',', SORTED 'N',CODEPAGE '1208')
DB20000I The SQL command completed successfully.
select * from Test
NON_ASCII ASCII
---------- ----------
SQL1822N Unexpected error code "Data Error" received from data source
"DEPARTMENT". Associated text and tokens are "Invalid Column Delimiter".
SQLSTATE=560BD
Nickname CODEPAGE option is the same as the file charset
.
CREATE NICKNAME Test(Non_Ascii CHAR(50), Ascii VARCHAR(10)) FOR SERVER Sample OPTIONS (FILE_PATH '/home/db2inst1/test_data_88591.txt',COLUMN_DELIMITER ',', SORTED 'N',CODEPAGE '819')
DB20000I The SQL command completed successfully.
select * from Test
NON_ASCII ASCII
---------- ----------
ã ABCDf
1 record(s) selected.