Records added to a Hive table are not returned when accessing the table via a Big SQL client
After loading, inserting or otherwise populating table data directly in a Hive table, the new data is not returned when it is accessed using the Big SQL table. This is as a result of the Big SQL cache of the table not having been updated subsequent to the addition of the new data. It occurs because a latency of 20 to 30 minutes is required for the automated HCAT SYNC procedure to occur, so the Big SQL cached table has not yet been refreshed. As a workaround, you can call the HCAT_CACHE_SYNC procedure manually to refresh the Big SQL table or tables.
Symptoms
After creating records in a Hive table for the Big SQL schema, a query on a Big SQL schema table on a Big SQL node may not return all values as expected, making it seem as though the records have not been imported from the Hive table to the Big SQL table.Causes
This occurs due to an expected latency of up to 30 minutes for the automated HCAT SYNC procedure.Resolving the problem
You can call the HCAT_CACHE_SYNC stored procedure to refresh the Big SQL table or tables.
Example
For example, in Hive you may insert values into a table test, as
shown:
hive> insert into test values (300);
Querying the
test table returns two
values:hive> select * from test;
OK
200
300
Time taken: 0.127 seconds, Fetched: 2 row(s)
Querying
the same table in Big SQL returns only one
value:$ db2 "select * from test"
COL1
-----------
200
1 record(s) selected.
Call
the HCAT_CACHE_SYNC stored procedure to refresh the Big SQL
table:$ db2 "CALL SYSHADOOP.HCAT_CACHE_SYNC('BIGSQL', 'TEST')"
Return Status = 0
Querying
the test table now returns two values as
expected:$ db2 "select * from test"
COL1
-----------
300
200
2 record(s) selected.
Note: To
resolve the problem you need to call the HCAT_CACHE_SYNC stored procedure and not the
HCAT_SYNC_OBJECTS stored procedure. The latter will not refresh the cached table in Big SQL since
the DDL for the table has not changed.