Troubleshooting
Problem
Error when using nzload to load scientific notation data into numeric columns.
Symptom
When using nzload to load scientific notation data into numeric columns, the following error appears:
bad #: input row #(byte offset to last char examined) [field #, declaration] diagnostic, "text consumed"[last char examined]
----------------------------------------------------------------------------------------------------------------------------
1: 1007(136) [187, NUMERIC(17,9)] expected field delimiter or end of record, "3.
62"[E]
Resolving The Problem
Depending on your NPS release level, you can load scientific notation data into a numeric column in one of the following ways (described below):
- Using nzload with floating point data types (applies to NPS 4.0 and later)
- Using staging tables to load into character columns
- Using nzload to load the majority of the data and capture the error rows in a separate file
Using nzload with Floating Point Data Types
On systems running NPS 4.0 or later, nzload can load raw scientific notation data into columns of the floating point data types: REAL and DOUBLE PRECISION as shown in the following example:
- Anonymous(ADMIN)=> create table TABLE_S (col1 int, col2 int, col3 character(10), col4 DOUBLE) distribute on (col1);
Anonymous(ADMIN)=> \d TABLE_S
Table "TABLE_S"
Attribute | Type | Modifier | Default Value
-----------+------------------+----------+---------------
COL1 | INTEGER | |
COL2 | INTEGER | |
COL3 | CHARACTER(10) | |
COL4 | DOUBLE PRECISION | |
Distributed on hash: "COL1"
Anonymous(ADMIN)=>
[nz@cs-spubox6 ~]$ more load_scientific.dat
8|8|test 8|3.62E-7
8|8|test 8|3.62E-7
[nz@cs-spubox6 ~]$ nzload -db Anonymous -t TABLE_S -df load_scientific.dat -delim '|'
Load session of table 'TABLE_S' completed successfully
[nz@cs-spubox6 ~]$
Anonymous(ADMIN)=> select * from TABLE_S;
COL1 | COL2 | COL3 | COL4
------+------+------------+----------
8 | 8 | test 8 | 3.62e-07
8 | 8 | test 8 | 3.62e-07
(2 rows)
You can then either put it into another table or select from the DOUBLE column by doing the following:
- Anonymous(ADMIN)=> select col4::numeric(10,10) from TABLE_S;
?COLUMN?
--------------
0.0000003620
0.0000003620
0.0000003620
0.0000003620
(4 rows)
Anonymous(ADMIN)=>
Anonymous(ADMIN)=> create table TABLE_N (col1 int, col2 numeric(10,10)) distribute on (col1);
CREATE TABLE
Anonymous(ADMIN)=> insert into TABLE_N select col1, col4::numeric(10,10) from TABLE_S;
INSERT 0 4
Anonymous(ADMIN)=> select * from TABLE_N;
COL1 | COL2
------+--------------
8 | 0.0000003620
8 | 0.0000003620
8 | 0.0000003620
8 | 0.0000003620
(4 rows)
Using Staging Tables to Load Scientific Data into Character Columns
On any NPS system, you can use staging tables and multiple INSERT commands to load scientific data into character columns as shown in the following example:
- create table staging_table (col1 varchar(20));
create table production_table (col1 numeric(17,9));
Instead of using nzload, you can use the following INSERT commands because the staging table column is defined as varchar data which means that all types of data can be loaded into it.
- insert into staging_table values ('0');
insert into staging_table values ('87654321.987654321');
insert into staging_table values ('99999999.999999999');
insert into staging_table values ('11111111.111111111');
insert into staging_table values ('10000000.000000001');
insert into staging_table values ('3.62E-7');
Promote the data from the staging table to the production table column-by-column.
- insert into production_table select * from staging_table;
NPS detects if any of the columns are not similarly defined and automatically does the conversion which results in the following statement:
- select * from production_table;
COL1
--------------------
11111111.111111111
10000000.000000001
0.000000362
0.000000000
99999999.999999999
87654321.987654321
(6 rows)
Use nzload to Load the Majority of the Data and Capture the Error Rows for Loading Separately
The following load method can be performed on any NPS system. It involves moving data from a DOUBLE column into the NUMERIC column and is summarized below:
Perform the nzload directly into the production table with the -maxErrors option set to a decently high number to catch all of the problem rows. All of the rows that use scientific notation will be written to the *.nzbad file. Everything else will be directly loaded into the table.
Move, rename, and save the *.nzbad file.
Load the problem rows into a different staging table where that particular column is defined as DOUBLE rather than a NUMERIC(n,n).
Do the INSERT INTO production_table SELECT * FROM staging_table to promote the small subset of rows that used scientific notation.
When you use nzload to load the data into the staging table, it will throw away any rows that do not have valid data in the DOUBLE column.
In this method, less data has to be loaded and then moved into the final table because the initial nzload operation loaded most of the records without any problems.
[nz@cs-anonymous tmp]$ more scientific.data
3.62E-7
19.50
20.25
21.00
3.62E-7
Loading the rows into a column defined as numeric which will reject some rows..
anonymous(ADMIN)=> create table production_table (col1 numeric(17,9));
CREATE TABLE
nzload -db anonymous -t production_table -maxErrors 1000000000 -df /tmp/scientific.data -bf /tmp/scientific.bad
[nz@cs-anonymous tmp]$ more PRODUCTION_TABLE.anonymous.nzlog
Load started at:18-Aug-09 13:25:57 EDT
Database: anonymous
Tablename: PRODUCTION_TABLE
Datafile: /tmp/scientific.data
Host: cs-anonymous
Load Options
Field delimiter: '\t' File Buffer Size (MB): 8
NULL value: NULL Quoted data: No
Checkpoint: 0 Max errors: 1000000000
Skip records: 0 Max rows: 0
FillRecord: No Truncate String: No
Escape Char: None Accept Control Chars: No
Distribution stats: No Allow CR in string: No
BoolStyle: ONE_ZERO
Date Style: YMD Date Delim: '-'
Time Style: 24 Hour Time Delim: ':'
Number of Restarts: 2 Encoding: Latin9
Ignore Zero: No Require Quotes: No
Time extra zeros: No Load Replay Region (MB): 0
Found bad records
bad #: input row #(byte offset to last char examined) [field #, declaration] diagnostic, "text consumed"[last char examined]
----------------------------------------------------------------------------------------------------------------------------
1: 1(4) [1, NUMERIC(17,9)] expected field delimiter or end of record, "3.62"[E]
2: 5(4) [1, NUMERIC(17,9)] expected field delimiter or end of record, "3.62"[E]
Statistics
number of records read: 5
number of bad records: 2
number of discarded records: 0
-------------------------------------------------
number of records loaded: 3
Elapsed Time (sec): 0.0
-----------------------------------------------------------------------------
Load completed at: 18-Aug-09 13:25:57 EDT
=============================================================================
[nz@cs-anonymous tmp]$
anonymous(ADMIN)=> select * from production_table;
COL1
--------------
20.250000000
21.000000000
19.500000000
(3 rows)
Loading the rejected rows into a column defined as Double from the bad file generated from the previous step.
anonymous(ADMIN)=> create table production_table2 (col1 double);
CREATE TABLE
nzload -db anonymous -t production_table2 -maxErrors 1000000000 -df /tmp/scientific.bad -bf /tmp/scientific.bad2
Now we can move these rows from production_table2 to production_table.
anonymous(ADMIN)=> insert into production_table select * from production_table2;
INSERT 0 2
anonymous(ADMIN)=> select * from production_table;
COL1
--------------
19.500000000
0.000000000
0.000000000
20.250000000
21.000000000
(5 rows)
Historical Number
NZ321781
Was this topic helpful?
Document Information
Modified date:
17 October 2019
UID
swg21570795