Virtualized table list for Databricks needs manual updates after upgrade to 5.0 and later
When you have a previous Databricks connection, you must modify the virtualized table list and update the connection ID (CID) before you upgrade Data Virtualization on Cloud Pak for Data to 5.0 and later.
Symptoms
After you upgrade Data Virtualization, the Databricks connections that you added as a generic data source prior to upgrade might not be accessible or work properly. Complete these steps to workaround the issue:
Resolving the problem
- In the Data virtualization > Run SQL editor, run the
dvsys.listrdbc
query to identify the internal Connection ID (CID) andNODE_NAME
for your Databricks data source:select CID from dvsys.listrdbc
- If you do not see any Data Virtualization connection
after you run the
select CID from dvsys.listrdbc
query, run the following query to identify the nodes for each Databricks connection:
Your output from this command will resemble the following example:select CID, REMOTENODES,replace(CID,'DATAB','OTHER') as OLDCID from dvsys.rdb_connections where CID like 'DATAB%'
DATAB10000 qpendpoint_4:6418 OTHER10000
DATAB10000
is the CID,qpendpoint_4
is the endpoint name, andOTHER10000
is the OLDCID. - Go to the corresponding
gaiandb_config.properties
file for the endpoints listed in the remote nodes. You can find the corresponding gaiandb_config file by replacing the variable <endpoint_number> in the file path /mnt/dv_data/qpendpoints/data<endpoint_number> for each endpoint.As an example, the gaiandb_config file for
qpendpoint_1
endpoint can be found in the file path /mnt/dv_data/qpendpoints/data1. The <endpoint_number> is1
. - For each endpoint listed in the remote notes, manually replace the
OLDCID
with the newCID
. For example, replace the OLDCIDOTHER10000_CCID=2c428798-c0c7-478b-88b6-6e7555d56243
with the CIDDATAB10000_CCID=2c428798-c0c7-478b-88b6-6e7555d56243
. In the example, the OLDCID prefixOTHER10000
is replaced with the CID prefixDATAB10000
.The connection IDs for your previous Databricks connections are now updated with the new prefix DATAB, and Databricks connection can be seen in the UI.
- Run this query to identify the Databricks ID (CID) and it's
corresponding
endpoint:
SELECT CID, REMOTENODES FROM DVSYS.RDB_CONNECTIONS WHERE CID LIKE 'DATAB%'
- Run the following stored procedure in Data virtualization > Run SQL or
the Db2 console for each CID and endpoint
combination.
- Ensure you replace <CID> with the Databricks ID, <default_catalog> with the catalog where the connection is established, and <REMOTE_NODES> with the qpendpoint.
The following are example queries, where:call DVSYS.setConfigProperty('RELOADTABLES_ALLTABS_<CID>', '<defualt_catalog>, null, null, null', '< REMOTENODES>',?,?); call DVSYS.setConfigProperty('RELOADTABLES_ALLCOLS_<CID>', '<defualt_catalog>, null, null, null', '< REMOTENODES>',?,?);
- <(CID)> has the value DATAB10020,
- <catalog_name> has the value hive_metastore,
- and <REMOTE_NODES> has the value qpendpoint_3:6417.
call DVSYS.setConfigProperty('RELOADTABLES_ALLTABS_DATAB10020', 'hive_metastore, null, null, null', 'qpendpoint_3:6417',?,?); call DVSYS.setConfigProperty('RELOADTABLES_ALLCOLS_DATAB10020', 'hive_metastore, null, null, null', 'qpendpoint_3:6417',?,?);
call dvsys.SETCONFIGPROPERTY('RELOADTABLES_ALLTABS_OTHER10000', 'hive_metastore,null,null,null','qpendpoint_1:6415', ?, ?) call dvsys.SETCONFIGPROPERTY('RELOADTABLES_ALLCOLS_OTHER10000', 'hive_metastore,null,null,null','qpendpoint_1:6415', ?, ?)
- Optional: After the properties are set, run the DVSYS.RESTORECONFIG
procedure on the endpoints in which Databricks data sources were
added.Note: You can skip this step if you completed step 3.
call DVSYS.RESTORECONFIG('qpendpoint_3:6417', ?, ?);
For example: If Data Virtualization has endpoints qpendpoint_3:6417 and qpendpoint_4:6418, then run this procedure:call DVSYS.RESTORECONFIG('qpendpoint_3:6417,qpendpoint_4:6418', ?, ?);