Adding data assets to a deployment space

Learn about various ways of adding and promoting data assets to a space and data types that are used in deployments.

Data can be:

Notes:

  • For definitions of data-related terms, refer to Asset types and properties.
  • If your organization uses Watson Knowledge Catalog, a catalog can serve as a feature store so you can access data assets that contain features that can be shared across an organization. Data assets include metadata about where they are used in models. Catalogs have controlled access at the catalog and the data asset level.

You can add data to a space in one of these ways:

Data added to a space is managed in a similar way to data added to a Watson Studio project. For example:

  • Adding data to a space creates a new copy of the asset and its attachments within the space, maintaining a reference back to the project asset. If an asset such as a data connection requires access credentials, they persist and are the same whether you are accessing the data from a project or from a space.

  • Just like with data connection in a project, you can edit data connection details from the space.

  • Data assets are stored in a space in the same way that they are stored in a project. They use the same file structure for the space as the structure used for the project.

  • For details on how Watson Studio connects to data, refer to Accessing data.

Adding data and connections to space by using UI

To add data or connections to space by using UI:

  1. From the Assets tab of your space in Watson Studio, click Import assets.
  2. If you want to add a local file, select Local file and then select Data asset.
  3. If you want to add a connected data asset, select Connected data.
  4. If you want to add a connection, select Data access tools and then select Connection.
  5. Complete the remaining steps.

The data asset displays in the space and is available for use as an input data source in a deployment job.

Note:
If you added a Connection or Connected data that uses Cloud Pak for Data credentials, make sure that the option to Use your Cloud Pak for Data credentials to authenticate to the data source is checked, after you add it to a space.

Adding data to space programmatically

If you are using APIs to create, update, or delete Watson Machine Learning assets, make sure that you are using only Watson Machine Learning API calls.

For examples of how to add assets programmatically, refer to these sample notebooks:

Data source reference types

Data source reference types are referenced in Watson Machine Learning requests to represent input data and results locations. Use data_asset and connection_asset for these types of data sources:

  • Cloud Object Storage
  • Db2
  • Database data
  • Volumes

Notes:

  • data_asset requires an href
  • connection_asset requires the connection_id for the connection object and different location fields, depending on the data source type
  • For data assets hosted locally, the reference type is fs
  • For Decision Optimization, the reference type is url.

Example data_asset payload

{"input_data_references": [{
    "type": "data_asset",
    "connection": {
    },
    "location": {
        "href": "/v2/assets/<asset_id>?space_id=<space_id>"
    }
}]

Example connection_asset payload

"input_data_references": [{
    "type": "connection_asset",
    "connection": {
        "id": "<connection_guid>"
    },
    "location": {
        "bucket": "<bucket_name>",
        "file_name": "<directory_name>/<file_name>"
    }
    <other wdp-properties supported by runtimes>
}]

For more details and examples, refer to the documentation for:

Using data from the Cloud Object Storage service

Cloud Object Storage service can be used with deployment jobs through a connected data asset or a connection asset. To use data from the Cloud Object Storage service:

  1. Create a connection to IBM Cloud Object Storage by adding a Connection to your project or space and selecting Cloud Object Storage (infrastructure) or Cloud Object Storage as the connection type. Provide the secret key, access key, and login URL.

    Note When you are creating a connection to Cloud Object Storage or Cloud Object Storage (Infrastructure), you must specify both access_key and secret_key. If access_key and secret_key are not specified, downloading the data from that connection will not work in a batch deployment job. For reference, see IBM Cloud Object Storage connection and IBM Cloud Object Storage (infrastructure) connection

  2. Add input and output files to the deployment space as connected data by using the COS connection that you created.

Using data from the Storage volume (NFS) service

Data in Storage volumes can be used with deployment jobs through a connected data asset (data_asset type) or a connection asset (connection_asset type).

To use data from the Storage volume service:

  1. Create a connection to Storage Volumes by adding a Connection to your space and selecting Volumes as the connection type.
  2. Add input and output files to the deployment space as connected data, by using the connection that you created in step 1.

For details on using data from a networked file system, see Storage volume connection.

Learn more:

Parent topic: Adding assets to a deployment space