Joining virtual objects in Data Virtualization

Important: IBM® Cloud Pak for Data Version 4.5 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.

Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.5 reaches end of support. For more information, see Upgrading IBM Software Hub in the IBM Software Hub Version 5.1 documentation.

You can join multiple tables from multiple data sources into a single virtual table, which is also known as a join view.

About this task

To create a virtual view from existing virtualized tables, complete the following steps.

Procedure

  1. On the navigation menu, click Data > Data virtualization.
    The service menu opens to the Data sources page by default.
  2. On the service menu, click Virtualization > Virtualized data.
    Your existing virtualized tables are listed.
    Tip: If you prefer, you can click Open in SQL editor to skip the following steps and use the IBM common SQL engine instead. See the SQL Reference for more details on SQL syntax and function compatibility.
  3. Select the tables that you want to join and click Join to display the Join virtual objects window.
  4. If you selected two tables, you can use the graphical join wizard to click and drag from one column name to another to create a join key. The join keys must be of the same data type. This step does not copy or move any data. It creates a table definition that is a combination of the two tables.
    Joining two tables
    To create a join key by using only the keyboard, follow these steps:
    1. Press Enter to select a column name in a row in Table 1.
    2. Press the Tab key to navigate to a column name in a row in Table 2.
    3. Press Enter on the row in Table 2. The rows are joined.
    Restrictions: If you are joining tables with many rows, the preview of the join might time out after 10 minutes if the data sources are unable to complete the processing of the join.

    If the columns of the tables that you are joining do not share any common data, the preview of your join view might be empty. You can continue to join your virtual objects; however, the join view might not contain valuable data. Any data that you add to the virtual objects is automatically reflected in the join view.

    If you selected more than two tables, you are taken to the SQL editor, which is pre-populated with a CREATE VIEW statement that references the selected tables. You need to modify this statement to add join predicate, filters, and any functions needed. See the SQL Reference for a full description of the supported operations.

  5. Click Next. You can use the new table to query the data from both of the base tables.
  6. On the Edit column names screen, enter a View name, select a Schema, and edit column names as needed.
  7. Select where to assign the view with one of the following options:
    Assign to When to use this option
    Data request Select Data request if you created the virtual table in response to a data request. Then, choose the appropriate request. The table also appears in Virtualized data.
    Project Select Project if you created the virtual table to use in a specific analytics project. Then, choose the appropriate project. The table also appears in Virtualized data.
    Virtualized data Select Virtualized data if the table was not created in response to a data request or to use in a specific project. This setting is the default if no data requests or projects exist.
  8. Click Create view to complete the process.

Results

If Data Virtualization and Watson Knowledge Catalog are installed in the same OpenShift® project (namespace), your virtual object is published to the primary catalog.

What to do next

You can use the virtualized data in a number of different ways. For example, you can use them in a Jupyter Notebook, create new models within the Model Builder, or build charts or graphics on the analytics dashboard.
Note: You cannot apply data masking policies with views. For more information, see Restrictions for data masking.