Important: IBM Cloud Pak® for Data
Version 4.7 will reach end of support (EOS) on 31 July, 2025. For more information, see the
Discontinuance of service announcement for
IBM Cloud Pak for Data Version 4.X.
Upgrade to IBM Software Hub Version
5.1 before IBM Cloud Pak for Data Version 4.7 reaches end of
support. For more information, see Upgrading IBM Software Hub in the IBM Software Hub Version 5.1
documentation.
You can use data that is stored as CSV, TSV, and Excel files on remote data sources to
create virtual tables. Segment or combine data from one or more files to create a virtual
table.
Remember:
The data requests
() feature
was removed in
Cloud Pak for Data Version
4.7.0. Consider
workflows instead.
Before you begin
To access data that is stored in CSV, TSV, and Excel files, you must install a remote connector
on the data source where the files are located. A remote connector enables Watson
Query to
automatically access files located in a remote data source. For more information, see Installing connectors on remote data
sources.
Watson
Query auto-detects a table in an Excel sheet by scanning each row
until it reaches a non-empty
cell. The final column is determined
from the first non-empty cell in the first row. The final row is the first one having empty cells in
all columns.
The Excel source wrapper in Watson
Query was upgraded in Cloud Pak for Data 4.0.2 to
allow access to spreadsheets of unlimited size. In previous releases, Watson
Query might
fail to access spreadsheets that contain numerous data, for example greater than 250,000 cells, or
have a large file size, for example greater than 3 MB, which easily happens when a workbook contains
macros. If you encounter any issues (for example, unexpected types or values), you can toggle
between new and old source wrappers with the following SQL API calls.
- To switch to the previous Excel source wrapper named GExcel, use the following
command:
CALL DVSYS.SETCONFIGPROPERTY('EXCEL_SOURCE_WRAPPER_CLASS', 'com.ibm.db2j.GExcel', '', ?, ?)
- To switch to the current Excel source wrapper named QExcel, use the following command:
CALL DVSYS.SETCONFIGPROPERTY('EXCEL_SOURCE_WRAPPER_CLASS', 'com.ibm.db2j.QExcel', '', ?, ?)
For more information, see Queries on virtualized flat files fail with incorrect results in Watson Query.
Procedure
- On the navigation menu, click .
The service menu opens to the Data
sources page by default.
- On the service menu, click
and then click the Files tab.
The list of available data sources appears. You can narrow down the displayed assets by
using the available filters.
- Select the remote data source where you want to browse for files.
The list of files and directories on your data source appears. You can enter a particular path
on the host system in the
File path field or click the files and directories
on the list to go through the host system.
To edit the remote data source, click Edit
connection in the action menu of the remote data source.
- Select the file that you want to virtualize and click Add to cart.
- To use the first row of your file as column headings in your virtual table, enable the
Use first row as column heading option. If this option is not enabled, you
are prompted to specify column names for your virtual file before you add it to your cart.
- If your file has multiple worksheets, you can select a worksheet to view additional
information and add it to your cart.
-
Click View cart to preview your file data selections as a virtual file.
From this window, you can edit the virtual table name to be created from file data.
Additionally, you can edit schema names, remove a selection from your cart, or edit the list of
remote files that participate in a merged table.
- Select the appropriate option to assign the virtual table to be created from file
data.
| Assign to |
When to use this option |
| Project |
Select Project if you created the virtual table to use in a specific
project. Then, choose the appropriate project. The table also appears in Virtualized
data. |
| Virtualized data |
Select Virtualized data if the table was not created to use in a
specific project. This setting is the default if no projects exist. |
- Select Publish to catalog if you also
want to publish to a selected catalog.
A list of available catalogs is shown in the
drop-down menu. Each catalog is tagged as
Governed or
Not
governed.
Note: You must have at least one catalog in
Watson Knowledge
Catalog.
You must have permission to publish to a
catalog. An administrator can enable whether all virtual objects are published to a selected
governed catalog, which prevents a user from publishing to a specified
catalog.
- Specify a schema in the Schema
field.
You can also create a schema by following these
steps.
- If you have the Watson
Query
Engineer or User role, leave the
Schema field as default to create a schema with your user ID.
- If you have the Watson
Query
Admin role, leave the
Schema field as default to create a schema with your user ID or enter the new
schema name in the Schema field.
-
Click Virtualize to complete the process.
When the status window appears, you can select to view your virtualized data
or virtualize more data. However, you must wait until virtualization is complete
before you navigate away from the page.
Results
If Watson
Query and Watson Knowledge
Catalog are installed in the same OpenShift® project (namespace), your virtual object is
published to the primary catalog.