IBM Support

Import with schema is much slower than import without schema

Technical Blog Post


Abstract

Import with schema is much slower than import without schema

Body

Last month, a client told me that on his product server, importing into a table with schema specified took about 700 seconds, while importing into the same table without schema, it only took 100 seconds:

db2 "import from date.ixf of ixf commitcount 10000 insert into tbl_cufms_cuslt_settle_rslt_mchnt"                    # this command took about 100 seconds  
db2 "import from date.ixf of ixf commitcount 10000 insert into cu_fmsdb.tbl_cufms_cuslt_settle_rslt_mchnt"  #this command took about 700 seconds

He said he can reproduce above performance difference on the product server at will, but cannot on test server.

I never heard an issue like this before, with or without schema should not impact import performance so much.  So I suggested him collect a db2trc for both cases:

db2 connect to <db name>
db2 "values (mon_get_application_handle())"   ==> get the application handle

APPL_HDL=<put the application handle here>

db2trc on -t -apphdl $APPL_HDL -i  512m

db2 -v "import from ...." 

db2trc dump  db2trc.dmp
db2trc off

db2trc perfrep -g -sort timeelapsed db2trc.dmp db2trc.perfrep

Checking the tracing report, I saw in bad case, sqlra_hash_loc_stmt_id() took much more time than in good case:

BAD CASE:
 nCalls   TotalElapsed     AvgElapsed     TotalSpent       AvgSpent FunctionName
...
   1259    0.664387732    0.000527711    0.001331863    0.000001058 sqljs_ddm_excsqlstt
   1259    0.658172548    0.000522774    0.002004242    0.000001592 sqlrr_execute
   3777    0.494674765    0.000130970    0.494674765    0.000130970 sqlra_hash_loc_stmt_id

GOOD CASE:
 nCalls   TotalElapsed     AvgElapsed     TotalSpent       AvgSpent FunctionName
...
    351    0.040626068    0.000115744    0.000344320    0.000000981 sqljs_ddm_excsqlstt
    349    0.038218593    0.000109509    0.000408683    0.000001171 sqlrr_execute
   1087    0.000187287    0.000000172    0.000187287    0.000000172 sqlra_hash_loc_stmt_id

 

As I know, when STATEMENT monitor or UOW monitor flag is on,  db2 calls sqlra_hash_loc_stmt_id() before completing executing a statement, it  walks through the entire cached statement list to retrieve the statement object for monitoring.  You should already know that an import is actually executing "INSERT INTO ..." statement for each row of data, so sqlra_hash_loc_stmt_id() should be called many times depending on the row number of the input data file.

Then why sqlra_hash_loc_stmt_id() took much longer in bad case? 

 

Checking "db2pd -d <db name> -app -dyn" you should find out the AnchID and StmtUID for the "INSERT INTO ..." statement used by the import job:

You can see in bad case, the ahchID is 638, and there are  totally 14453 statements with the same ahchID 638,

while in good case, the ahchID is 585 because the insert statement text changed(this is because we didn't specify schema in the import command), and there is only 9 statements sharing the same ahchID.

 

BAD CASE:
Address            AppHandl [nod-index] NumAgents  CoorEDUID  Status                  C-AnchID C-StmtUID  L-AnchID L-StmtUID
...
0x0780000008870080 33456    [000-33456] 1          129667     UOW-Executing           0        0          638      120675

Dynamic SQL Statements:
Address            AnchID StmtUID    NumEnv     NumVar     NumRef     NumExe     Text
...
0x0A000201D9FD8160 638  120675     2          2          712        143106910  INSERT INTO
   CU_FMSDB.TBL_CUFMS_CUSLT_SETTLE_RSLT_MCHNT
   ("SETTLE_BUSS_CATA","MCHNT_CD","INS_ID_CD","TRANS_ROLE","EXP_IN","TRANS_CHNL","CROSS_DIST_IN","CONN_MD","SETTLE_CURR_CD","FUND_
   TP","DEBT_SETTLE_NUM","CRET_SETTLE_NUM","DEBT_SETTLE_AT","CRET_SETTLE_AT","DEBT_DISC_NUM","CRET_DISC_NUM","DEBT_DISC_AT","CRET_
   DISC_AT","RETURN_DEBT_DISC_NUM","RETURN_CRET_DISC_NUM","RETURN_DEBT_DISC_AT","RETURN_CRET_DISC_AT") VALUES (CAST (? AS
   CHAR(4))                         ,CAST (? AS  CHAR(15))                         ,CAST (? AS  CHAR(11))                        
   ,CAST (? AS  CHAR(2))                         ,CAST (? AS  CHAR(1))                         ,CAST (? AS  CHAR(2))             
              ,CAST (? AS  CHAR(1))                         ,CAST (? AS  CHAR(1))                         ,CAST (? AS  CHAR(3))  
                         ,CAST (? AS  CHAR(2))                         ,CAST (? AS  DEC(10, 0))                         ,CAST (?
   AS  DEC(10, 0))                         ,CAST (? AS  DEC(16, 0))                         ,CAST (? AS  DEC(16, 0))             
              ,CAST (? AS  DEC(10, 0))                         ,CAST (? AS  DEC(12, 0))                         ,CAST (? AS
   DEC(16, 0))                         ,CAST (? AS  DEC(16, 0))                         ,CAST (? AS  DEC(10, 0))                 
          ,CAST (? AS  DEC(10, 0))                         ,CAST (? AS  DEC(16, 0))                         ,CAST (? AS  DEC(16,
   0))                         )

 

$ grep -c "638    " db2pd.dyn.bad.*
db2pd.dyn.bad:14453

 

GOOD CASE:


Address            AppHandl [nod-index] NumAgents  CoorEDUID  Status                  C-AnchID C-StmtUID  L-AnchID L-StmtUID
...
0x0780000008870080 33456    [000-33456] 1          129667     UOW-Waiting             0        0          585      403524

Dynamic SQL Statements:
Address            AnchID StmtUID    NumEnv     NumVar     NumRef     NumExe     Text
...
0x0A000201EE2CC5C0 585  403524     1          1          1          186739     INSERT INTO
   tbl_cufms_cuslt_settle_rslt_mchnt
   ("SETTLE_BUSS_CATA","MCHNT_CD","INS_ID_CD","TRANS_ROLE","EXP_IN","TRANS_CHNL","CROSS_DIST_IN","CONN_MD","SETTLE_CURR_CD","FUND_
   TP","DEBT_SETTLE_NUM","CRET_SETTLE_NUM","DEBT_SETTLE_AT","CRET_SETTLE_AT","DEBT_DISC_NUM","CRET_DISC_NUM","DEBT_DISC_AT","CRET_
   DISC_AT","RETURN_DEBT_DISC_NUM","RETURN_CRET_DISC_NUM","RETURN_DEBT_DISC_AT","RETURN_CRET_DISC_AT") VALUES (CAST (? AS
   CHAR(4))                         ,CAST (? AS  CHAR(15))                         ,CAST (? AS  CHAR(11))                        
   ,CAST (? AS  CHAR(2))                         ,CAST (? AS  CHAR(1))                         ,CAST (? AS  CHAR(2))             
              ,CAST (? AS  CHAR(1))                         ,CAST (? AS  CHAR(1))                         ,CAST (? AS  CHAR(3))  
                         ,CAST (? AS  CHAR(2))                         ,CAST (? AS  DEC(10, 0))                         ,CAST (?
   AS  DEC(10, 0))                         ,CAST (? AS  DEC(16, 0))                         ,CAST (? AS  DEC(16, 0))             
              ,CAST (? AS  DEC(10, 0))                         ,CAST (? AS  DEC(12, 0))                         ,CAST (? AS
   DEC(16, 0))                         ,CAST (? AS  DEC(16, 0))                         ,CAST (? AS  DEC(10, 0))                 
          ,CAST (? AS  DEC(10, 0))                         ,CAST (? AS  DEC(16, 0))                         ,CAST (? AS  DEC(16,
   0))                         )


$ grep -c " 585   " db2pd.dyn.good.*
db2pd.dyn.good:9

 

Note that sqlra_hash_loc_stmt_id() walks through the entire cached statement list to find out the statement object with specified ahchID and StmtUID, so if the specified ahchID is 638(note: many statements sharing this ahchID), sqlra_hash_loc_stmt_id()  should access much more statement objects in the list to find out the right object, as a result, sqlra_hash_loc_stmt_id() takes much longer to return.

So, this problem has nothing to do with import feature, has nothing to do with the schema specified or not, but due to the long cached dynamic statement list.

To resolve the problem, the user could either:

1. Turn off STATEMENT monitor or UOW monitor:
   Statement                              (DFT_MON_STMT) = OFF
  Unit of work                            (DFT_MON_UOW) = OFF

 also prevent from turning on the STATEMENT monitor nor UOW monitor in session level:
   db2 update monitor switches using statement on uow on

2. Run "db2 FLUSH PACKAGE CACHE DYNAMIC" before import.

 

 

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm13286359