Queries on virtualized flat files fail with incorrect results in Data Virtualization

When you virtualize flat files that are accessed by using a remote agent, some queries have incorrect results in cases where column type conversions are made for columns that have special characters.

Symptoms

The Excel source wrapper in Data Virtualization was upgraded in Cloud Pak for Data 4.0.2 to allow access to spreadsheets of unlimited size. In previous releases, Data Virtualization might fail to access spreadsheets that contain numerous data, for example greater than 250,000 cells, or have a large file size, for example greater than 3 MB, which easily happens when a workbook contains macros. If you encounter any issues (for example, unexpected types or values), you can toggle between new and old source wrappers with the following SQL API calls.

  • To switch to the previous Excel source wrapper named GExcel, use the following command:
    CALL DVSYS.SETCONFIGPROPERTY('EXCEL_SOURCE_WRAPPER_CLASS', 'com.ibm.db2j.GExcel', '', ?, ?)
  • To switch to the current Excel source wrapper named QExcel, use the following command:
    CALL DVSYS.SETCONFIGPROPERTY('EXCEL_SOURCE_WRAPPER_CLASS', 'com.ibm.db2j.QExcel', '', ?, ?)

For example, currency values that start with a dollar symbol are inferred as type VARCHAR in the original preview. However, if that column is retyped to a numeric type such as DECIMAL, when you virtualize, queries against the virtual table have erroneous null cells in that column.

For example, the query shows the following results.
db2 => SELECT * FROM DV_TEST."TBL_XLS_TESTFORMULA2" WHERE IDNUM BETWEEN 980 AND 990
IDNUM       NAME                           GROSS      DED1     DED2     TDED     NET
----------- ------------------------------ ---------- -------- -------- -------- ----------
        980 Ahmad Kynett                            -        -        -        -          -
        981 Sherlock Spalton                        -        -        -        -          -
        982 Ira Gladtbach                           -        -        -        -          -
        983 Andrej Dumphy                           -        -        -        -          -
        984 Jeremie Fullicks                        -        -        -        -          -
        985 Costa Shambrook                         -        -        -        -          -
        986 Jacob Darell                            -        -        -        -          -
        987 Kevon Heineking                         -        -        -        -          -
        988 Sib MacTrusty                           -        -        -        -          -
        989 Angelique Wace                          -        -        -        -          -
        990 Lothario Kuschel                        -        -        -        -          -

  11 record(s) selected.
However, the query should return the following results.
db2 => SELECT * FROM DV_LOCAL."TBL_XLS_TESTFORMULA2" WHERE IDNUM BETWEEN 980 AND 990
IDNUM       NAME                           GROSS      DED1     DED2     TDED     NET
----------- ------------------------------ ---------- -------- -------- -------- ----------
        980 Ahmad Kynett                      4196.77   568.05   349.44   917.50    3279.27
        981 Sherlock Spalton                  4904.10   681.37   308.92   990.29    3913.81
        982 Ira Gladtbach                     3189.25   553.35   387.08   940.43    2248.81
        983 Andrej Dumphy                     4152.29   690.10   499.46  1189.56    2962.73
        984 Jeremie Fullicks                  2132.98   516.84   319.20   836.05    1296.93
        985 Costa Shambrook                   3058.66   525.03   363.98   889.01    2169.64
        986 Jacob Darell                      4247.33   687.88   376.42  1064.30    3183.02
        987 Kevon Heineking                   3437.73   588.29   464.82  1053.10    2384.62
        988 Sib MacTrusty                     2231.61   586.28   356.07   942.34    1289.26
        989 Angelique Wace                    3973.60   683.93   318.11  1002.04    2971.56
        990 Lothario Kuschel                  2785.71   512.24   314.47   826.71    1959.00

  11 record(s) selected.

Resolving the problem

To work around this issue, complete the following steps.

  1. Run the following SQL query to find the agent NODE_NAME and PROPERTY that need to be modified, replacing <your file name> with the name of the file that was virtualized.
    select NODE_NAME, REGEXP_REPLACE( id, '_ARGS$', '_VTI' )  PROPERTY from dvsys.listconfig where id like 'VTF_%_ARGS' and def like  '%<your file name>%'
  2. Run the following SQL query to use the GExcel implementation for accessing this spreadsheet so that types can be converted correctly. Replace <NODE_NAME> and <PROPERTY> with the values that you found in step 1.
    CALL DVSYS.SETCONFIGPROPERTY('<PROPERTY>', 'com.ibm.db2j.GExcel', '<NODE_NAME>', ?, ?)
  3. To revert to the new implementation, rerun the same command from step 2, replacing GExcel with QExcel.