Technical Blog Post
Abstract
Import with schema is much slower than import without schema
Body
Last month, a client told me that on his product server, importing into a table with schema specified took about 700 seconds, while importing into the same table without schema, it only took 100 seconds:
db2 "import from date.ixf of ixf commitcount 10000 insert into tbl_cufms_cuslt_settle_rslt_mchnt" # this command took about 100 seconds
db2 "import from date.ixf of ixf commitcount 10000 insert into cu_fmsdb.tbl_cufms_cuslt_settle_rslt_mchnt" #this command took about 700 seconds
He said he can reproduce above performance difference on the product server at will, but cannot on test server.
I never heard an issue like this before, with or without schema should not impact import performance so much. So I suggested him collect a db2trc for both cases:
db2 connect to <db name>
db2 "values (mon_get_application_handle())" ==> get the application handle
APPL_HDL=<put the application handle here>
db2trc on -t -apphdl $APPL_HDL -i 512m
db2 -v "import from ...."
db2trc dump db2trc.dmp
db2trc off
db2trc perfrep -g -sort timeelapsed db2trc.dmp db2trc.perfrep
Checking the tracing report, I saw in bad case, sqlra_hash_loc_stmt_id() took much more time than in good case:
BAD CASE:
nCalls TotalElapsed AvgElapsed TotalSpent AvgSpent FunctionName
...
1259 0.664387732 0.000527711 0.001331863 0.000001058 sqljs_ddm_excsqlstt
1259 0.658172548 0.000522774 0.002004242 0.000001592 sqlrr_execute
3777 0.494674765 0.000130970 0.494674765 0.000130970 sqlra_hash_loc_stmt_id
GOOD CASE:
nCalls TotalElapsed AvgElapsed TotalSpent AvgSpent FunctionName
...
351 0.040626068 0.000115744 0.000344320 0.000000981 sqljs_ddm_excsqlstt
349 0.038218593 0.000109509 0.000408683 0.000001171 sqlrr_execute
1087 0.000187287 0.000000172 0.000187287 0.000000172 sqlra_hash_loc_stmt_id
As I know, when STATEMENT monitor or UOW monitor flag is on, db2 calls sqlra_hash_loc_stmt_id() before completing executing a statement, it walks through the entire cached statement list to retrieve the statement object for monitoring. You should already know that an import is actually executing "INSERT INTO ..." statement for each row of data, so sqlra_hash_loc_stmt_id() should be called many times depending on the row number of the input data file.
Then why sqlra_hash_loc_stmt_id() took much longer in bad case?
Checking "db2pd -d <db name> -app -dyn" you should find out the AnchID and StmtUID for the "INSERT INTO ..." statement used by the import job:
You can see in bad case, the ahchID is 638, and there are totally 14453 statements with the same ahchID 638,
while in good case, the ahchID is 585 because the insert statement text changed(this is because we didn't specify schema in the import command), and there is only 9 statements sharing the same ahchID.
BAD CASE:
Address AppHandl [nod-index] NumAgents CoorEDUID Status C-AnchID C-StmtUID L-AnchID L-StmtUID
...
0x0780000008870080 33456 [000-33456] 1 129667 UOW-Executing 0 0 638 120675
Dynamic SQL Statements:
Address AnchID StmtUID NumEnv NumVar NumRef NumExe Text
...
0x0A000201D9FD8160 638 120675 2 2 712 143106910 INSERT INTO
CU_FMSDB.TBL_CUFMS_CUSLT_SETTLE_RSLT_MCHNT
("SETTLE_BUSS_CATA","MCHNT_CD","INS_ID_CD","TRANS_ROLE","EXP_IN","TRANS_CHNL","CROSS_DIST_IN","CONN_MD","SETTLE_CURR_CD","FUND_
TP","DEBT_SETTLE_NUM","CRET_SETTLE_NUM","DEBT_SETTLE_AT","CRET_SETTLE_AT","DEBT_DISC_NUM","CRET_DISC_NUM","DEBT_DISC_AT","CRET_
DISC_AT","RETURN_DEBT_DISC_NUM","RETURN_CRET_DISC_NUM","RETURN_DEBT_DISC_AT","RETURN_CRET_DISC_AT") VALUES (CAST (? AS
CHAR(4)) ,CAST (? AS CHAR(15)) ,CAST (? AS CHAR(11))
,CAST (? AS CHAR(2)) ,CAST (? AS CHAR(1)) ,CAST (? AS CHAR(2))
,CAST (? AS CHAR(1)) ,CAST (? AS CHAR(1)) ,CAST (? AS CHAR(3))
,CAST (? AS CHAR(2)) ,CAST (? AS DEC(10, 0)) ,CAST (?
AS DEC(10, 0)) ,CAST (? AS DEC(16, 0)) ,CAST (? AS DEC(16, 0))
,CAST (? AS DEC(10, 0)) ,CAST (? AS DEC(12, 0)) ,CAST (? AS
DEC(16, 0)) ,CAST (? AS DEC(16, 0)) ,CAST (? AS DEC(10, 0))
,CAST (? AS DEC(10, 0)) ,CAST (? AS DEC(16, 0)) ,CAST (? AS DEC(16,
0)) )
$ grep -c "638 " db2pd.dyn.bad.*
db2pd.dyn.bad:14453
GOOD CASE:
Address AppHandl [nod-index] NumAgents CoorEDUID Status C-AnchID C-StmtUID L-AnchID L-StmtUID
...
0x0780000008870080 33456 [000-33456] 1 129667 UOW-Waiting 0 0 585 403524
Dynamic SQL Statements:
Address AnchID StmtUID NumEnv NumVar NumRef NumExe Text
...
0x0A000201EE2CC5C0 585 403524 1 1 1 186739 INSERT INTO
tbl_cufms_cuslt_settle_rslt_mchnt
("SETTLE_BUSS_CATA","MCHNT_CD","INS_ID_CD","TRANS_ROLE","EXP_IN","TRANS_CHNL","CROSS_DIST_IN","CONN_MD","SETTLE_CURR_CD","FUND_
TP","DEBT_SETTLE_NUM","CRET_SETTLE_NUM","DEBT_SETTLE_AT","CRET_SETTLE_AT","DEBT_DISC_NUM","CRET_DISC_NUM","DEBT_DISC_AT","CRET_
DISC_AT","RETURN_DEBT_DISC_NUM","RETURN_CRET_DISC_NUM","RETURN_DEBT_DISC_AT","RETURN_CRET_DISC_AT") VALUES (CAST (? AS
CHAR(4)) ,CAST (? AS CHAR(15)) ,CAST (? AS CHAR(11))
,CAST (? AS CHAR(2)) ,CAST (? AS CHAR(1)) ,CAST (? AS CHAR(2))
,CAST (? AS CHAR(1)) ,CAST (? AS CHAR(1)) ,CAST (? AS CHAR(3))
,CAST (? AS CHAR(2)) ,CAST (? AS DEC(10, 0)) ,CAST (?
AS DEC(10, 0)) ,CAST (? AS DEC(16, 0)) ,CAST (? AS DEC(16, 0))
,CAST (? AS DEC(10, 0)) ,CAST (? AS DEC(12, 0)) ,CAST (? AS
DEC(16, 0)) ,CAST (? AS DEC(16, 0)) ,CAST (? AS DEC(10, 0))
,CAST (? AS DEC(10, 0)) ,CAST (? AS DEC(16, 0)) ,CAST (? AS DEC(16,
0)) )
$ grep -c " 585 " db2pd.dyn.good.*
db2pd.dyn.good:9
Note that sqlra_hash_loc_stmt_id() walks through the entire cached statement list to find out the statement object with specified ahchID and StmtUID, so if the specified ahchID is 638(note: many statements sharing this ahchID), sqlra_hash_loc_stmt_id() should access much more statement objects in the list to find out the right object, as a result, sqlra_hash_loc_stmt_id() takes much longer to return.
So, this problem has nothing to do with import feature, has nothing to do with the schema specified or not, but due to the long cached dynamic statement list.
To resolve the problem, the user could either:
1. Turn off STATEMENT monitor or UOW monitor:
Statement (DFT_MON_STMT) = OFF
Unit of work (DFT_MON_UOW) = OFF
also prevent from turning on the STATEMENT monitor nor UOW monitor in session level:
db2 update monitor switches using statement on uow on
2. Run "db2 FLUSH PACKAGE CACHE DYNAMIC" before import.
UID
ibm13286359