Creating virtualized tables from files (Data Virtualization)

Use data that is stored as CSV, TSV, and Excel files on remote data sources to create virtual tables. You can segment or combine data from one or more files to create a virtual table.

Before you begin

To access data that is stored in CSV, TSV, and Excel files, you must install a remote connector on the data source where the files are located. A remote connector enables Data Virtualization to automatically access files located in a remote data source. For more information, see Installing connectors on remote data sources.

Data Virtualization auto-detects a table in a given Excel sheet by scanning each row left to right until it reaches a non-empty cell, which then becomes the top-left location of the table. The final column is determined from the first non-empty cell in the first row. The final row is the first one having empty cells in all columns.

Procedure

Click Data > Data Virtualization. From the service menu, click Virtualization > Virtualize > Files.
The list of available data sources appears. You can narrow down the displayed assets by using the available filters.
Select the remote data source where you want to browse for files.
The list of files and directories on your data source appears. You can enter a particular path on the host system in the File path field or click the files and directories on the list to go through the host system.
To edit the remote data source description, click Edit description in the action menu of the remote data source.
Select the file that you want to virtualize and click Add to cart.
- To use the first row of your file as column headings in your virtual table, enable the Use first row as column heading option. If this option is not enabled, you are prompted to specify column names for your virtual file before adding it to your cart.
- If your file has multiple worksheets, you can select a worksheet to view additional information and add it to your cart.
Click View cart to preview your file data selections as a virtual file.
From this window, you can edit the virtual table name to be created from file data. Additionally, you can edit schema names, remove a selection from your cart, or edit the list of remote files that participate in a merged table.

Select the appropriate option to assign the virtual table to be created from file data:

Assign to	When to use this option
Data request	Select Data request if you created the virtual table in response to a data request. Then, choose the appropriate request.
Project	Select Project if you created the virtual table to use in a specific analytics project. Then, choose the appropriate project.
My virtualized data	Select My virtualized data if the table was not created in response to a data request or to use in a specific project. This is the default setting if no data requests or projects exist.

Specify a schema in the Schemas field.
Optionally, you can create a schema:
- If you have the Data Virtualization Engineer or User roles, leave the Schemas field as default to create a schema with your user ID.
- If you have the Data Virtualization Admin role, leave the Schemas field as default to create a schema with your user ID or enter the new schema name in the Schemas field.
For more information, see Creating schemas for virtual objects.
Click Virtualize to complete the process.
When the status window appears, you can select to view your virtualized data or virtualize more data.

Results

If Data Virtualization and Watson™ Knowledge Catalog are installed in the same OpenShift® project (namespace), your virtual object is published to the default catalog.

What to do next

You can collect statistics for your virtual object. For more information, see Improving query performance (Data Virtualization).
Optionally, from the My virtualized data page, you can publish your virtual object to the catalog. For more information, see Publishing virtual data to the catalog.
You can also create join views. See Creating a join view from multiple tables.