Virtualized table list for Databricks needs manual updates after upgrade to 5.0 and later

When you have a previous Databricks connection, you must modify the virtualized table list and update the connection ID (CID) before you upgrade Data Virtualization on Cloud Pak for Data to 5.0 and later.

Symptoms

After you upgrade Data Virtualization, the Databricks connections that you added as a generic data source prior to upgrade might not be accessible or work properly. Complete these steps to workaround the issue:

Resolving the problem

  1. In the Data virtualization > Run SQL editor, run the dvsys.listrdbc query to identify the internal Connection ID (CID) and NODE_NAME for your Databricks data source:
    select CID from dvsys.listrdbc
  2. If you do not see any Data Virtualization connection after you run the select CID from dvsys.listrdbc query, run the following query to identify the nodes for each Databricks connection:
    select CID, REMOTENODES,replace(CID,'DATAB','OTHER') as OLDCID from dvsys.rdb_connections where CID like 'DATAB%'
    Your output from this command will resemble the following example:
    DATAB10000 qpendpoint_4:6418 OTHER10000 
    DATAB10000 is the CID, qpendpoint_4 is the endpoint name, and OTHER10000 is the OLDCID.
  3. Go to the corresponding gaiandb_config.properties file for the endpoints listed in the remote nodes. You can find the corresponding gaiandb_config file by replacing the variable <endpoint_number> in the file path /mnt/dv_data/qpendpoints/data<endpoint_number> for each endpoint.

    As an example, the gaiandb_config file for qpendpoint_1 endpoint can be found in the file path /mnt/dv_data/qpendpoints/data1. The <endpoint_number> is 1.

  4. For each endpoint listed in the remote notes, manually replace the OLDCID with the new CID. For example, replace the OLDCID OTHER10000_CCID=2c428798-c0c7-478b-88b6-6e7555d56243 with the CID DATAB10000_CCID=2c428798-c0c7-478b-88b6-6e7555d56243. In the example, the OLDCID prefix OTHER10000 is replaced with the CID prefix DATAB10000.

    The connection IDs for your previous Databricks connections are now updated with the new prefix DATAB, and Databricks connection can be seen in the UI.

  5. Run this query to identify the Databricks ID (CID) and it's corresponding endpoint:
    SELECT CID, REMOTENODES FROM DVSYS.RDB_CONNECTIONS WHERE CID LIKE 'DATAB%'
  6. Run the following stored procedure in Data virtualization > Run SQL or the Db2 console for each CID and endpoint combination.
    • Ensure you replace <CID> with the Databricks ID, <default_catalog> with the catalog where the connection is established, and <REMOTE_NODES> with the qpendpoint.
    call DVSYS.setConfigProperty('RELOADTABLES_ALLTABS_<CID>', '<defualt_catalog>, null, null, null', '< REMOTENODES>',?,?);
    call DVSYS.setConfigProperty('RELOADTABLES_ALLCOLS_<CID>', '<defualt_catalog>, null, null, null', '< REMOTENODES>',?,?);
    The following are example queries, where:
    • <(CID)> has the value DATAB10020,
    • <catalog_name> has the value hive_metastore,
    • and <REMOTE_NODES> has the value qpendpoint_3:6417.
    call DVSYS.setConfigProperty('RELOADTABLES_ALLTABS_DATAB10020', 'hive_metastore, null, null, null', 'qpendpoint_3:6417',?,?);
    call DVSYS.setConfigProperty('RELOADTABLES_ALLCOLS_DATAB10020', 'hive_metastore, null, null, null', 'qpendpoint_3:6417',?,?);
    call dvsys.SETCONFIGPROPERTY('RELOADTABLES_ALLTABS_OTHER10000', 'hive_metastore,null,null,null','qpendpoint_1:6415', ?, ?)
    call dvsys.SETCONFIGPROPERTY('RELOADTABLES_ALLCOLS_OTHER10000', 'hive_metastore,null,null,null','qpendpoint_1:6415', ?, ?)
  7. Optional: After the properties are set, run the DVSYS.RESTORECONFIG procedure on the endpoints in which Databricks data sources were added.
    Note: You can skip this step if you completed step 3.
    call DVSYS.RESTORECONFIG('qpendpoint_3:6417', ?, ?);
    For example: If Data Virtualization has endpoints qpendpoint_3:6417 and qpendpoint_4:6418, then run this procedure:
    call DVSYS.RESTORECONFIG('qpendpoint_3:6417,qpendpoint_4:6418', ?, ?);