ibm-watson-studio-lib for Python

The ibm-watson-studio-lib library for Python provides access to assets in Watson Studio. It can be used in notebooks in a project or notebooks that have been promoted to a deployment space. ibm-watson-studio-lib provides support for working with data assets and connections, as well as browsing functionality for all other asset types.

There are two kinds of data assets:

Stored data assets refer to files in the storage associated with the current project or space. The library can load and save these files. For data larger than one megabyte, this is not recommended. The library requires that the data is kept in memory in its entirety which might be inefficient when processing huge datasets.
Connected data assets represent data that must be accessed through a connection. Using the library, you can retrieve the properties (metadata) of the connected data asset and its connection. The functions do not return the data of a connected data asset. You can either use the code that is generated when you use the Insert to code function from the notebook data panel to access the data or you must write your own code.

Note:

The ibm-watson-studio-lib functions do not encode or decode data when saving data to or getting data from a file.
The ibm-watson-studio-lib functions can’t be used to access folder assets (files on a path to the project storage).

Setting up the `ibm-watson-studio-lib` library

When ibm-watson-studio-lib is used in a notebook, the project or space in which it runs is determined automatically.

Use the following import statement to set up ibm-watson-studio-lib:

from ibm_watson_studio_lib import access_project_or_space
credentials_dic={"project_id":'<ProjectId>', "token":'<ProjectToken>'}
wslib = access_project_or_space(params=credentials_dic)

Helper functions

You can get information about the supported functions in the ibm-watson-studio-lib library programmatically by using help(wslib), or for an individual function by using help(wslib.<function_name>, for example help(wslib.get_connection).

You can use the helper function wslib.show(...) for formatted printing of Python dictionaries and lists of dictionaries, which are the common result output type of the ibm-watson-studio-lib functions.

The `ibm-watson-studio-lib` functions

The ibm-watson-studio-lib library exposes a set of functions that are grouped in the following way:

Get project or space information
Fetch data
Save data
Get connection information
Get connected data information
Access assets by ID instead of name
Work with the mounted project storage
Access project storage directly
Browse project assets

Get project or space information

While developing code, you might not know the exact names of data assets or connections. The following functions provide lists of assets, from which you can pick the relevant ones. In all examples, you can use wslib.show(assets) to pretty-print the list. The index of each item is printed in front of the item.

list_connections()

This function returns a list of the connections. The list of returned connections is not sorted by any criterion and can change when you call the function again. You can pass a dictionary item instead of a name to the get_connection function. For example:
```
  # Import the lib
  from ibm_watson_studio_lib import access_project_or_space
  wslib = access_project_or_space()

  assets = wslib.list_connections()
  wslib.show(assets)
  connprops = wslib.get_connection(assets[0])
  wslib.show(connprops)
```
list_connected_data()

This function returns the connected data assets. The list of returned connected data assets is not sorted by any criterion and can change when you call the function again. You can pass a dictionary item instead of a name to the get_connected_data function.
list_stored_data()

This function returns a list of the stored data assets (data files). The list of returned data assets is not sorted by any criterion and can change when you call the function again. You can pass a dictionary item instead of a name to the load_data and save_datafunctions.

Note: A heuristic is applied to distinguish between connected data assets and stored data assets. However, there may be cases where a data asset of the wrong kind appears in the returned lists.
wslib.here

By using this entry point, you can retrieve metadata about the project or space that the lib is working with. The entry point wslib.here provides the following functions:
- get_name()
  
  This function returns the name of the project or space.
- get_description()
  
  This function returns the description of the project or space.
- get_ID()
  
  This function returns the ID of the project or space.
- get_storage()
  
  This function returns storage information for the project or space.

Fetch data

You can use the following functions to fetch data from a stored data asset (a file) in your project or space.

load_data(asset_name_or_item, attachment_type_or_item=None)

This function loads the data of a stored data asset into a BytesIO buffer. The function is not recommended for very large files.

The function takes the following parameters:
- asset_name_or_item: (Required) Either a string with the name of a stored data asset or an item like those returned by list_stored_data().
- attachment_type_or_item: (Optional) Attachment type to load. A data asset can have more than one attachment with data. Without this parameter, the default attachment type, namely data_asset is loaded. Specify this parameter if the attachment type is not data_asset. For example, if a plain text data asset has an attached profile from Natural Language Analysis, this can be loaded as attachment type data_profile_nlu.
  
  Here is an example that shows you how to load the data of a data asset:
```
# Import the lib
from ibm_watson_studio_lib import access_project_or_space
wslib = access_project_or_space()

# Fetch the data from a file
my_file = wslib.load_data("MyFile.csv")

# Read the CSV data file into a pandas DataFrame
my_file.seek(0)
import pandas as pd
pd.read_csv(my_file, nrows=10)
```
download_file(asset_name_or_item, file_name=None, attachment_type_or_item=None)

This function downloads the data of a stored data asset and stores it in the specified file in the file system of your runtime. The file is overwritten if it already exists.

The function takes the following parameters:
- asset_name_or_item: (Required) Either a string with the name of a stored data asset or an item like those returned by list_stored_data().
- file_name: (Optional) The name of the file that the downloaded data is stored to. It defaults to the asset's attachment name.
- attachment_type_or_item: (Optional) The attachment type to download. A data asset can have more than one attachment with data. Without this parameter, the default attachment type, namely data_asset is downloaded. Specify this parameter if the attachment type is not data_asset. For example, if a plain text data asset has an attached profile from Natural Language Analysis, this can be downlaoded loaded as attachment type data_profile_nlu.
  
  Here is an example that shows you how to you can use download_file to make your custom Python script available in your notebook:
```
# Import the lib
from ibm_watson_studio_lib import access_project_or_space
wslib = access_project_or_space()

# Let's assume you have a Python script "helpers.py" with helper functions on your local machine.
# Upload the script to your Watson Studio project using the Data Panel on the right.

# Download the script to the file system of your runtime
wslib.download_file("helpers.py")

# import the required functions to use them in your notebook
from helpers import my_func
my_func()
```

Save data

You can use the following function to save data in memory to a file associated with your project or space. This function does multiple things. Firstly, it puts the data into a file in the project or space storage and then it adds this data as a data asset to your project or space so you can see the data that you saved as a file in the data assets list in your project or space.

save_data(asset_name_or_item, data, overwrite=None, mime_type=None, file_name=None)

The function takes the following parameters:
- asset_name_or_item: (Required) The name of the created asset or list item that is returned by list_stored_data(). You can use the item if you like to overwrite an existing file.
- data: (Required) The data to upload. This can be any object of type bytes-like-object, for example a byte buffer.
  
  Note: The data to load is not allowed to exceed 2 GB in size. To save data that is larger than 2 GB, see Save data and upload file size limitation in project-lib and ibm-watson-studio-lib for Python.
- overwrite: (Optional) Overwrites the data of a stored data asset if it already exists. By default, this is set to false. If an asset item is passed instead of a name, the behavior is to overwrite the asset.
- mime_type: (Optional) The MIME type for the created asset. By default the MIME type is determined from the asset name suffix. If you use asset names without a suffix, specify the MIME type here. For example mime_type=application/text for plain text data. This parameter is ignored when overwriting an asset.
- file_name: (Optional) The file name to be used in the project or space storage. The data is saved in the storage associated with the project or space. When creating a new asset, the file name is derived from the asset name, but may be different. If you want to access the file directly, you can specify a file name. This parameter is ignored when overwriting an asset.
  
  Here is an example that shows you how to save data to a file:
```
# Import the lib
from ibm_watson_studio_lib import access_project_or_space
wslib = access_project_or_space()

# let's assume you have the pandas DataFrame  pandas_df which contains the data
# you want to save as a csv file
wslib.save_data("my_asset_name.csv", pandas_df.to_csv(index=False).encode())

# the function returns a dict which contains the asset_name, asset_id, file_name and additional information
# upon successful saving of the data
```
upload_file(file_path, asset_name=None, file_name=None, overwrite=False, mime_type=None)

You can upload a file from your runtime's file system to the project or space. Like save_data, this function will put the file in your project or space storage and create a data asset in your project.

Note: The size of the file, referenced by the parameter file_nameis not allowed to exceed 2 GB. To save data that is larger than 2 GB, see Save data and upload file size limitation in project-lib and ibm-watson-studio-lib for Python.

The function takes the following parameters:
- file_path: (Required) The path to the file in the file system.
- asset_name: (Optional) The name of the data asset that is created. It defaults to the name of the file to be uploaded. An asset with the same name is not allowed to exist.
- file_name: (Optional) The name of the file that is created in the storage associated with the project or space. It defaults to the name of the file to be uploaded.
- overwrite: (Optional) Overwrites an existing file in storage. Defaults to false.
- mime_type: (Optional) The MIME type for the created asset. By default the MIME type is determined from the asset name suffix. If you use asset names without a suffix, specify the MIME type here. For example mime_type='application/text' for plain text data. This parameter is ignored when overwriting an asset.
  
  Here is an example that shows you how you can upload a file to the project or space:
```
# Import the lib
from ibm_watson_studio_lib import access_project_or_space
wslib = access_project_or_space()

# Let's assume you have downloaded a file and want to save it
# in your project.
import urllib.request
urllib.request.urlretrieve("https://some/url/data_file.csv", "data_file.csv")
wslib.upload_file("data_file.csv")

# The function returns a dictionary which contains the asset_name, asset_id, file_name and 
# additional information upon successful saving of the data. 
# The value `input_file_copied` tells you, if the file has been copied to project storage.
# If the value is true, you can savely delete the local input file. 
# If the value is false, you should not delete the input file because it already 
# exists in the mounted project storage. Deleting the file 
# from the mounted project storage will corrupt the created asset.
# The return value can be passed to load_data() to read back the data.
```

Get connection information

You can use the following function to access the connection metadata of a given connection.

get_connection(name_or_item)

This function returns the properties (metadata) of a connection which you can use to fetch data from the connection data source. Use wslib.show(connprops) to view the properties. The special key "." in the returned dictionary provides information about the connection asset.

The function takes the following parameter:
- name_or_item: (Required) Either a string with the name of a connection or an item like those returned by list_connections().
  
  Note that when you work with notebooks in Watson Studio, you can use the Insert to code function from the data panel in your notebook to load data from a connection into a pandas DataFrame for example.

Get connected data information

You can use the following function to access the metadata of a connected data asset.

get_connected_data(name_or_item)

This function returns the properties of a connected data asset, including the properties of the underlying connection. Use wslib.show() to view the properties. The special key "." in the returned dictionary provides information about the data and the connection assets.

The function takes the following parameter:
- name_or_item: (Required) Either a string with the name of a connected data asset or an item like those returned by list_connected_data().
  
  Note that when you work with notebooks in Watson Studio, you can use the Insert to code function from the data panel in your notebook to load data from a connected data asset into a pandas DataFrame for example.

Access asset by ID instead of name

You should preferably always access data assets and connections by a unique name. Asset names are not necessarily always unique and the ibm-watson-studio-lib functions will raise an exception when a name is ambiguous. You can rename data assets in the UI to resolve the conflict.

Accessing assets by a unique ID is possible but is discouraged as IDs are valid only in the current project or space and will break code when transferred to a different project or space. This can happen for example, when projects are exported and re-imported, or when notebooks or assets are promoted from projects to spaces. You can get the ID of a connection, connected or stored data asset by using the corresponding list function, for example list_connections().

The entry point wslib.by_id provides the following functions:

get_connection(asset_id)

This function accesses a connection by the connection asset ID.
get_connected_data(asset_id)

This function accesses a connected data asset by the connected data asset ID.
load_data(asset_id, attachment_type_or_item=None)

This function loads the data of a stored data asset by passing the asset ID. See load_data() for a decsription of the other parameters you can pass.
save_data(asset_id, data, overwrite=None, mime_type=None, file_name=None)

This function saves data to a stored data asset by passing the asset ID. This implies overwrite=True. See save_data() for a description of the other parameters you can pass.
download_file(asset_id, file_name=None, attachment_type_or_item=None)

This function downloads the data of a stored data asset by passing the asset ID. See download_file() for a description of the other parameters you can pass.

Work with the mounted project storage

In Cloud Pak for Data the project or space storage is mounted in the local file system of your runtime. You can retrieve information about the mounted project storage using the entry point wslib.mount.

The entry point wslib.mount provides the following functions:

is_available()

This function checks if the project storage is available as a mount in the local file system.
get_base_dir()

This function returns the absolute path to the data asset folder in the local file system.
get_data_path(asset_name_or_item)

This function returns the absolute path of the file referenced by a data asset in the local file system.

The function takes the following parameter:
- name_or_item: (Required) Either a string with the name of a stored data asset or an item like those returned by list_stored_data().
register_asset(file_path, asset_name=None, mime_type=None)

This function registers the file in the local mount as a data asset in your project or space. This operation fails if a data asset with the same name already exists.

The function takes the following parameters:
- file_path: (Required) The absolute path to the file in the locally mounted project or space storage.
- asset_name: (Optional) The name of the created asset. It defaults to the file name.
- mime_type: (Optional) The MIME type for the created asset. By default the MIME type is determined from the asset name suffix. Use this parameter to specify a MIME type if your file name does not have a file extension or if you want to set a different MIME type.
  
  The following example shows how to register a file in the mounted project storage as an asset:
```
# Import the lib
from ibm_watson_studio_lib import access_project_or_space
wslib = access_project_or_space()

# Let's assume you have a really large file that you cannot 
# upload to your project via 'save_data()' or 'upload_file()', 
# because those methods require the data to fit into memory.
my_large_file = "my_large_file.csv"

# Move the file to the data assets folder in the 
# mounted project storage:
data_asset_folder = wslib.mount.get_base_dir()
import os, shutil
target_path = os.path.join(data_asset_folder, my_large_file)
shutil.move(my_large_file, target_path)

# Register the file as data asset
wslib.mount.register_asset(target_path, asset_name="LargeFile.csv")

# the function returns a dict which contains the asset_name, asset_id, file_name and additional information
# upon successful creation of the data asset.
```
  Note: You can register a file several times as a different data asset. Deleting one of those assets in the project interface also deletes the file in storage which means that other asset references to the file might be broken.

Access project storage directly

You can fetch data from project storage and store data in project storage without synchronizing the project assets using the entry point wslib.storage.

In Cloud Pak for Data, the project or space storage is mounted in the notebook runtime and you will typically use the entry point wslib.mount and file system operations to access the storage.

The entry point wslib.storage provides the following functions:

fetch_data(filename)

This function returns the data in a file as a BytesIO buffer. The file does not need to be registered as data asset.

The function takes the following parameter:
- filename: (Required) The name of the file in the project or space storage.
store_data(filename, data, overwrite=False)

This function saves data in memory to storage, but does not create a new data asset. The function returns a dictionary which contains the file name, file path and additional information. Use wslib.show() to print the information.

The function takes the following parameters:
- filename: (Required) The name of the file in the project or space storage.
- data: (Required) The data to save as a bytes-like object.
- overwrite: (Optional) Overwrites the data of file in storage if it already exists. By default, this is set to false.
download_file(storage_filename, local_filename=None)

This function downloads the data in a file in storage and stores it in the specified local file. The local file is overwritten if it already existed.

The function takes the following parameters:
- storage_filename: (Required) The name of the file in storage to download.
- local_filename: (Optional) The name of the file in the local file system of your runtime to downloaded the file to. Omit this parameter to use the storage file name.
register_asset(storage_path, asset_name=None, mime_type=None)

This function registers the file in storage as a data asset in your project or space. This operation fails if a data asset with the same name already exists.

The function takes the following parameters:
- storage_path: (Required) The path of the file in storage.
- asset_name: (Optional) The name of the created asset. It defaults to the file name.
- mime_type: (Optional) The MIME type for the created asset. By default the MIME type is determined from the asset name suffix. Use this parameter to specify a MIME type if your file name does not have a file extension or if you want to set a different MIME type.
  
  Note: You can register a file several times as a different data asset. Deleting one of those assets in the project also deletes the file in storage, which means that other asset references to the file might be broken.

Browse project assets

The entry point wslib.assets provides generic, read-only access to assets of any type. For selected asset types, there are dedicated functions that provide additional data. To get help on the available functions, use help(wslib.assets.API).

The following naming conventions apply:

Functions named list_<something> return a list of Python dictionaries. Each dictionary represents one asset and includes a small set of properties (metadata) that identifies the asset.
Functions named get_<something> return a single Python dictionary with the properties for the asset.

To pretty-print a dictionary or list of dictionaries, use wslib.show().

The functions expect either the name of an asset, or an item from a list as the parameter. By default, the functions return only a subset of the available asset properties. By setting the parameter raw=True, you can get the full set of asset properties.

The entry point wslib.assets provides the following functions:

list_assets(asset_type, name=None, query=None, selector=None, raw=False)

This function lists all assets for the given type with respect to the given constraints.

The function takes the following parameters:
- asset_type: (Required) The type of the assets to list, for example data_asset. See list_asset_types() for a list of the available asset types. Use asset type asset for the list of all available assets in the project or space.
- name: (Optional) The name of the asset to list. Use this parameter if more than one asset with the same name exists. You can only specify either name and query.
- query: (Optional) A query string that is passed to the Watson Data API to search for assets. You can only specify either name and query.
- selector: (Optional) A custom filter function on the candidate asset dictionary items. If the selector function returns True, the asset is included in the returned asset list.
- raw: (Optional) Returns all of the available metadata. By default, the parameter is set to False and only a subset of the properties is returned.
  
  Examples of using the list_assets function:
```
# Import the lib
from ibm_watson_studio_lib import access_project_or_space
wslib = access_project_or_space()

# List all assets in the project or space
all_assets = wslib.assets.list_assets("asset")
wslib.show(all_assets)

# List all data assets with name 'MyFile.csv'
assets_by_name = wslib.assets.list_assets("data_asset", name="MyFile.csv")

# List all data assets whose name starts with "MyF"
assets_by_query = wslib.assets.list_assets("data_asset", query="asset.name:(MyF*)")

# List all data assets which are larger than 1MB
sizeFilter = lambda x: x['metadata']['size'] > 1000000
large_assets = wslib.assets.list_assets("data_asset", selector=sizeFilter, raw=True)

# List all notebooks
notebooks = wslib.assets.list_assets("notebook")
```
list_asset_types(raw=False)

This function lists all available asset types.

The function takes the following parameter:
- raw: (Optional) Returns the full set of metadata. By default, the parameter is False and only a subset of the properties is returned.
list_datasource_types(raw=False)

This function lists all available data source types.

The function takes the following parameter:
- raw: (Optional) Returns the full set of metadata. By default, the parameter is False and only a subset of the properties is returned.
get_asset(name_or_item, asset_type=None, raw=False)

The function returns the metadata of an asset.

The function takes the following parameters:
- name_or_item: (Required) The name of the asset or an item like those returned by list_assets()
- asset_type: (Optional) The type of the asset. If the parameter name_or_item contains a string for the name of the asset, setting asset_type is required.
- raw: (Optional) Returns the full set of metadata. By default, the parameter is False and only a subset of the properties is returned.
  
  Example of using the list_assets and get_asset functions:
```
notebooks = wslib.assets.list_assets('notebook')
wslib.show(notebooks)

notebook = wslib.assets.get_asset(notebooks[0])
wslib.show(notebook)
```
get_connection(name_or_item, with_datasourcetype=False, raw=False)

This function returns the metadata of a connection.

The function takes the following parameters:
- name_or_item: (Required) The name of the connection or an item like those returned by list_connections()
- with_datasourcetype: (Optional) Returns additional information about the data source type of the connection.
- raw: (Optional) Returns the full set of metadata. By default, the parameter is False and only a subset of the properties is returned.
get_connected_data(name_or_item, with_datasourcetype=False, raw=False)

This function returns the metadata of a connected data asset.

The function takes the following parameters:
- name_or_item: (Required) The name of the connected data asset or an item like those returned by list_connected_data()
- with_datasourcetype: (Optional) Returns additional information about the data source type of the associated connected data asset.
- raw: (Optional) Returns the full set of metadata. By default, the parameter is False and only a subset of the properties is returned.
get_stored_data(name_or_item, raw=False)

This function returns the metadata of a stored data asset.

The function takes the following parameters:
- name_or_item: (Required) The name of the stored data asset or an item like those returned by list_stored_data()
- raw: (Optional) Returns the full set of metadata. By default, the parameter is False and only a subset of the properties is returned.
list_attachments(name_or_item_or_asset, asset_type=None, raw=False)

This function returns a list of the attachments of an asset.

The function takes the following parameters:
- name_or_item_or_asset: (Required) The name of the asset or an item like those returned by list_stored_data() or get_asset().
- asset_type: (Optional) The type of the asset. It defaults to type data_asset.
- raw: (Optional) Returns the full set of metadata. By default, the parameter is False and only a subset of the properties is returned.
  
  Example of using the list_attachments function to read an attachment of a stored data asset:
```
assets = wslib.list_stored_data()
wslib.show(assets)

asset = assets[0]
attachments = wslib.assets.list_attachments(asset)
wslib.show(attachments)
buffer = wslib.load_data(asset, attachments[0])
```

Learn more

For examples of how to use some of the functions provided by the library in a notebook, see Working with ibm-watson-studio-lib.

Parent topic: Using ibm-watson-studio-lib