ibm-watson-studio-lib for Python
The ibm-watson-studio-lib
library for Python provides access to assets in Watson Studio. It can be used in notebooks in a project or notebooks that have been promoted to a deployment space. ibm-watson-studio-lib
provides
support for working with data assets and connections, as well as browsing functionality for all other asset types.
There are two kinds of data assets:
- Stored data assets refer to files in the storage associated with the current project or space. The library can load and save these files. For data larger than one megabyte, this is not recommended. The library requires that the data is kept in memory in its entirety which might be inefficient when processing huge datasets.
- Connected data assets represent data that must be accessed through a connection. Using the library, you can retrieve the properties (metadata) of the connected data asset and its connection. The functions do not return the data of a connected data asset. You can either use the code that is generated when you use the Insert to code function from the notebook data panel to access the data or you must write your own code.
Note:
- The
ibm-watson-studio-lib
functions do not encode or decode data when saving data to or getting data from a file. - The
ibm-watson-studio-lib
functions can’t be used to access folder assets (files on a path to the project storage).
Setting up the ibm-watson-studio-lib
library
When ibm-watson-studio-lib
is used in a notebook, the project or space in which it runs is determined automatically.
Use the following import statement to set up ibm-watson-studio-lib
:
from ibm_watson_studio_lib import access_project_or_space
credentials_dic={"project_id":'<ProjectId>', "token":'<ProjectToken>'}
wslib = access_project_or_space(params=credentials_dic)
Helper functions
You can get information about the supported functions in the ibm-watson-studio-lib
library programmatically by using help(wslib)
, or for an individual function by using help(wslib.<function_name>
, for
example help(wslib.get_connection)
.
You can use the helper function wslib.show(...)
for formatted printing of Python dictionaries and lists of dictionaries, which are the common result output type of the ibm-watson-studio-lib
functions.
The ibm-watson-studio-lib
functions
The ibm-watson-studio-lib
library exposes a set of functions that are grouped in the following way:
- Get project or space information
- Fetch data
- Save data
- Get connection information
- Get connected data information
- Access assets by ID instead of name
- Work with the mounted project storage
- Access project storage directly
- Browse project assets
Get project or space information
While developing code, you might not know the exact names of data assets or connections. The following functions provide lists of assets, from which you can pick the relevant ones. In all examples, you can use wslib.show(assets)
to
pretty-print the list. The index of each item is printed in front of the item.
-
list_connections()
This function returns a list of the connections. The list of returned connections is not sorted by any criterion and can change when you call the function again. You can pass a dictionary item instead of a name to the
get_connection
function. For example:# Import the lib from ibm_watson_studio_lib import access_project_or_space wslib = access_project_or_space() assets = wslib.list_connections() wslib.show(assets) connprops = wslib.get_connection(assets[0]) wslib.show(connprops)
-
list_connected_data()
This function returns the connected data assets. The list of returned connected data assets is not sorted by any criterion and can change when you call the function again. You can pass a dictionary item instead of a name to the
get_connected_data
function. -
list_stored_data()
This function returns a list of the stored data assets (data files). The list of returned data assets is not sorted by any criterion and can change when you call the function again. You can pass a dictionary item instead of a name to the
load_data
andsave_data
functions.Note: A heuristic is applied to distinguish between connected data assets and stored data assets. However, there may be cases where a data asset of the wrong kind appears in the returned lists.
-
wslib.here
By using this entry point, you can retrieve metadata about the project or space that the lib is working with. The entry point
wslib.here
provides the following functions:-
get_name()
This function returns the name of the project or space.
-
get_description()
This function returns the description of the project or space.
-
get_ID()
This function returns the ID of the project or space.
-
get_storage()
This function returns storage information for the project or space.
-
Fetch data
You can use the following functions to fetch data from a stored data asset (a file) in your project or space.
-
load_data(asset_name_or_item, attachment_type_or_item=None)
This function loads the data of a stored data asset into a BytesIO buffer. The function is not recommended for very large files.
The function takes the following parameters:
asset_name_or_item
: (Required) Either a string with the name of a stored data asset or an item like those returned bylist_stored_data()
.-
attachment_type_or_item
: (Optional) Attachment type to load. A data asset can have more than one attachment with data. Without this parameter, the default attachment type, namelydata_asset
is loaded. Specify this parameter if the attachment type is notdata_asset
. For example, if a plain text data asset has an attached profile from Natural Language Analysis, this can be loaded as attachment typedata_profile_nlu
.Here is an example that shows you how to load the data of a data asset:
# Import the lib from ibm_watson_studio_lib import access_project_or_space wslib = access_project_or_space() # Fetch the data from a file my_file = wslib.load_data("MyFile.csv") # Read the CSV data file into a pandas DataFrame my_file.seek(0) import pandas as pd pd.read_csv(my_file, nrows=10)
-
download_file(asset_name_or_item, file_name=None, attachment_type_or_item=None)
This function downloads the data of a stored data asset and stores it in the specified file in the file system of your runtime. The file is overwritten if it already exists.
The function takes the following parameters:
asset_name_or_item
: (Required) Either a string with the name of a stored data asset or an item like those returned bylist_stored_data()
.file_name
: (Optional) The name of the file that the downloaded data is stored to. It defaults to the asset's attachment name.-
attachment_type_or_item
: (Optional) The attachment type to download. A data asset can have more than one attachment with data. Without this parameter, the default attachment type, namelydata_asset
is downloaded. Specify this parameter if the attachment type is notdata_asset
. For example, if a plain text data asset has an attached profile from Natural Language Analysis, this can be downlaoded loaded as attachment typedata_profile_nlu
.Here is an example that shows you how to you can use
download_file
to make your custom Python script available in your notebook:# Import the lib from ibm_watson_studio_lib import access_project_or_space wslib = access_project_or_space() # Let's assume you have a Python script "helpers.py" with helper functions on your local machine. # Upload the script to your Watson Studio project using the Data Panel on the right. # Download the script to the file system of your runtime wslib.download_file("helpers.py") # import the required functions to use them in your notebook from helpers import my_func my_func()
Save data
You can use the following function to save data in memory to a file associated with your project or space. This function does multiple things. Firstly, it puts the data into a file in the project or space storage and then it adds this data as a data asset to your project or space so you can see the data that you saved as a file in the data assets list in your project or space.
-
save_data(asset_name_or_item, data, overwrite=None, mime_type=None, file_name=None)
The function takes the following parameters:
asset_name_or_item
: (Required) The name of the created asset or list item that is returned bylist_stored_data()
. You can use the item if you like to overwrite an existing file.-
data
: (Required) The data to upload. This can be any object of typebytes-like-object
, for example a byte buffer.Note: The data to load is not allowed to exceed 2 GB in size. To save data that is larger than 2 GB, see Save data and upload file size limitation in
project-lib
andibm-watson-studio-lib
for Python. overwrite
: (Optional) Overwrites the data of a stored data asset if it already exists. By default, this is set to false. If an asset item is passed instead of a name, the behavior is to overwrite the asset.mime_type
: (Optional) The MIME type for the created asset. By default the MIME type is determined from the asset name suffix. If you use asset names without a suffix, specify the MIME type here. For examplemime_type=application/text
for plain text data. This parameter is ignored when overwriting an asset.-
file_name
: (Optional) The file name to be used in the project or space storage. The data is saved in the storage associated with the project or space. When creating a new asset, the file name is derived from the asset name, but may be different. If you want to access the file directly, you can specify a file name. This parameter is ignored when overwriting an asset.Here is an example that shows you how to save data to a file:
# Import the lib from ibm_watson_studio_lib import access_project_or_space wslib = access_project_or_space() # let's assume you have the pandas DataFrame pandas_df which contains the data # you want to save as a csv file wslib.save_data("my_asset_name.csv", pandas_df.to_csv(index=False).encode()) # the function returns a dict which contains the asset_name, asset_id, file_name and additional information # upon successful saving of the data
-
upload_file(file_path, asset_name=None, file_name=None, overwrite=False, mime_type=None)
You can upload a file from your runtime's file system to the project or space. Like
save_data
, this function will put the file in your project or space storage and create a data asset in your project.Note: The size of the file, referenced by the parameter
file_name
is not allowed to exceed 2 GB. To save data that is larger than 2 GB, see Save data and upload file size limitation inproject-lib
andibm-watson-studio-lib
for Python.The function takes the following parameters:
file_path
: (Required) The path to the file in the file system.asset_name
: (Optional) The name of the data asset that is created. It defaults to the name of the file to be uploaded. An asset with the same name is not allowed to exist.file_name
: (Optional) The name of the file that is created in the storage associated with the project or space. It defaults to the name of the file to be uploaded.overwrite
: (Optional) Overwrites an existing file in storage. Defaults to false.-
mime_type
: (Optional) The MIME type for the created asset. By default the MIME type is determined from the asset name suffix. If you use asset names without a suffix, specify the MIME type here. For examplemime_type='application/text'
for plain text data. This parameter is ignored when overwriting an asset.Here is an example that shows you how you can upload a file to the project or space:
# Import the lib from ibm_watson_studio_lib import access_project_or_space wslib = access_project_or_space() # Let's assume you have downloaded a file and want to save it # in your project. import urllib.request urllib.request.urlretrieve("https://some/url/data_file.csv", "data_file.csv") wslib.upload_file("data_file.csv") # The function returns a dictionary which contains the asset_name, asset_id, file_name and # additional information upon successful saving of the data. # The value `input_file_copied` tells you, if the file has been copied to project storage. # If the value is true, you can savely delete the local input file. # If the value is false, you should not delete the input file because it already # exists in the mounted project storage. Deleting the file # from the mounted project storage will corrupt the created asset. # The return value can be passed to load_data() to read back the data.
Get connection information
You can use the following function to access the connection metadata of a given connection.
-
get_connection(name_or_item)
This function returns the properties (metadata) of a connection which you can use to fetch data from the connection data source. Use
wslib.show(connprops)
to view the properties. The special key"."
in the returned dictionary provides information about the connection asset.The function takes the following parameter:
-
name_or_item
: (Required) Either a string with the name of a connection or an item like those returned bylist_connections()
.Note that when you work with notebooks in Watson Studio, you can use the
Insert to code
function from the data panel in your notebook to load data from a connection into a pandas DataFrame for example.
-
Get connected data information
You can use the following function to access the metadata of a connected data asset.
-
get_connected_data(name_or_item)
This function returns the properties of a connected data asset, including the properties of the underlying connection. Use
wslib.show()
to view the properties. The special key"."
in the returned dictionary provides information about the data and the connection assets.The function takes the following parameter:
-
name_or_item
: (Required) Either a string with the name of a connected data asset or an item like those returned bylist_connected_data()
.Note that when you work with notebooks in Watson Studio, you can use the
Insert to code
function from the data panel in your notebook to load data from a connected data asset into a pandas DataFrame for example.
-
Access asset by ID instead of name
You should preferably always access data assets and connections by a unique name. Asset names are not necessarily always unique and the ibm-watson-studio-lib
functions will raise an exception when a name is ambiguous. You can rename
data assets in the UI to resolve the conflict.
Accessing assets by a unique ID is possible but is discouraged as IDs are valid only in the current project or space and will break code when transferred to a different project or space. This can happen for example, when projects are exported and
re-imported, or when notebooks or assets are promoted from projects to spaces. You can get the ID of a connection, connected or stored data asset by using the corresponding list function, for example list_connections()
.
The entry point wslib.by_id
provides the following functions:
-
get_connection(asset_id)
This function accesses a connection by the connection asset ID.
-
get_connected_data(asset_id)
This function accesses a connected data asset by the connected data asset ID.
-
load_data(asset_id, attachment_type_or_item=None)
This function loads the data of a stored data asset by passing the asset ID. See
load_data()
for a decsription of the other parameters you can pass. -
save_data(asset_id, data, overwrite=None, mime_type=None, file_name=None)
This function saves data to a stored data asset by passing the asset ID. This implies
overwrite=True
. Seesave_data()
for a description of the other parameters you can pass. -
download_file(asset_id, file_name=None, attachment_type_or_item=None)
This function downloads the data of a stored data asset by passing the asset ID. See
download_file()
for a description of the other parameters you can pass.
Work with the mounted project storage
In Cloud Pak for Data the project or space storage is mounted in the local file system of your runtime. You can retrieve information about the mounted project storage using the entry point wslib.mount
.
The entry point wslib.mount
provides the following functions:
-
is_available()
This function checks if the project storage is available as a mount in the local file system.
-
get_base_dir()
This function returns the absolute path to the data asset folder in the local file system.
-
get_data_path(asset_name_or_item)
This function returns the absolute path of the file referenced by a data asset in the local file system.
The function takes the following parameter:
name_or_item
: (Required) Either a string with the name of a stored data asset or an item like those returned bylist_stored_data()
.
-
register_asset(file_path, asset_name=None, mime_type=None)
This function registers the file in the local mount as a data asset in your project or space. This operation fails if a data asset with the same name already exists.
The function takes the following parameters:
file_path
: (Required) The absolute path to the file in the locally mounted project or space storage.asset_name
: (Optional) The name of the created asset. It defaults to the file name.-
mime_type
: (Optional) The MIME type for the created asset. By default the MIME type is determined from the asset name suffix. Use this parameter to specify a MIME type if your file name does not have a file extension or if you want to set a different MIME type.The following example shows how to register a file in the mounted project storage as an asset:
# Import the lib from ibm_watson_studio_lib import access_project_or_space wslib = access_project_or_space() # Let's assume you have a really large file that you cannot # upload to your project via 'save_data()' or 'upload_file()', # because those methods require the data to fit into memory. my_large_file = "my_large_file.csv" # Move the file to the data assets folder in the # mounted project storage: data_asset_folder = wslib.mount.get_base_dir() import os, shutil target_path = os.path.join(data_asset_folder, my_large_file) shutil.move(my_large_file, target_path) # Register the file as data asset wslib.mount.register_asset(target_path, asset_name="LargeFile.csv") # the function returns a dict which contains the asset_name, asset_id, file_name and additional information # upon successful creation of the data asset.
Note: You can register a file several times as a different data asset. Deleting one of those assets in the project interface also deletes the file in storage which means that other asset references to the file might be broken.
Access project storage directly
You can fetch data from project storage and store data in project storage without synchronizing the project assets using the entry point wslib.storage
.
In Cloud Pak for Data, the project or space storage is mounted in the notebook runtime and you will typically use the entry point wslib.mount
and file system operations to access the storage.
The entry point wslib.storage
provides the following functions:
-
fetch_data(filename)
This function returns the data in a file as a BytesIO buffer. The file does not need to be registered as data asset.
The function takes the following parameter:
filename
: (Required) The name of the file in the project or space storage.
-
store_data(filename, data, overwrite=False)
This function saves data in memory to storage, but does not create a new data asset. The function returns a dictionary which contains the file name, file path and additional information. Use
wslib.show()
to print the information.The function takes the following parameters:
filename
: (Required) The name of the file in the project or space storage.data
: (Required) The data to save as a bytes-like object.overwrite
: (Optional) Overwrites the data of file in storage if it already exists. By default, this is set to false.
-
download_file(storage_filename, local_filename=None)
This function downloads the data in a file in storage and stores it in the specified local file. The local file is overwritten if it already existed.
The function takes the following parameters:
storage_filename
: (Required) The name of the file in storage to download.local_filename
: (Optional) The name of the file in the local file system of your runtime to downloaded the file to. Omit this parameter to use the storage file name.
-
register_asset(storage_path, asset_name=None, mime_type=None)
This function registers the file in storage as a data asset in your project or space. This operation fails if a data asset with the same name already exists.
The function takes the following parameters:
storage_path
: (Required) The path of the file in storage.asset_name
: (Optional) The name of the created asset. It defaults to the file name.-
mime_type
: (Optional) The MIME type for the created asset. By default the MIME type is determined from the asset name suffix. Use this parameter to specify a MIME type if your file name does not have a file extension or if you want to set a different MIME type.Note: You can register a file several times as a different data asset. Deleting one of those assets in the project also deletes the file in storage, which means that other asset references to the file might be broken.
Browse project assets
The entry point wslib.assets
provides generic, read-only access to assets of any type. For selected asset types, there are dedicated functions that provide additional data. To get help on the available functions, use help(wslib.assets.API)
.
The following naming conventions apply:
- Functions named
list_<something>
return a list of Python dictionaries. Each dictionary represents one asset and includes a small set of properties (metadata) that identifies the asset. - Functions named
get_<something>
return a single Python dictionary with the properties for the asset.
To pretty-print a dictionary or list of dictionaries, use wslib.show()
.
The functions expect either the name of an asset, or an item from a list as the parameter. By default, the functions return only a subset of the available asset properties. By setting the parameter raw=True
, you can get the full set
of asset properties.
The entry point wslib.assets
provides the following functions:
-
list_assets(asset_type, name=None, query=None, selector=None, raw=False)
This function lists all assets for the given type with respect to the given constraints.
The function takes the following parameters:
asset_type
: (Required) The type of the assets to list, for exampledata_asset
. Seelist_asset_types()
for a list of the available asset types. Use asset typeasset
for the list of all available assets in the project or space.name
: (Optional) The name of the asset to list. Use this parameter if more than one asset with the same name exists. You can only specify eithername
andquery
.query
: (Optional) A query string that is passed to the Watson Data API to search for assets. You can only specify eithername
andquery
.selector
: (Optional) A custom filter function on the candidate asset dictionary items. If the selector function returnsTrue
, the asset is included in the returned asset list.-
raw
: (Optional) Returns all of the available metadata. By default, the parameter is set toFalse
and only a subset of the properties is returned.Examples of using the
list_assets
function:# Import the lib from ibm_watson_studio_lib import access_project_or_space wslib = access_project_or_space() # List all assets in the project or space all_assets = wslib.assets.list_assets("asset") wslib.show(all_assets) # List all data assets with name 'MyFile.csv' assets_by_name = wslib.assets.list_assets("data_asset", name="MyFile.csv") # List all data assets whose name starts with "MyF" assets_by_query = wslib.assets.list_assets("data_asset", query="asset.name:(MyF*)") # List all data assets which are larger than 1MB sizeFilter = lambda x: x['metadata']['size'] > 1000000 large_assets = wslib.assets.list_assets("data_asset", selector=sizeFilter, raw=True) # List all notebooks notebooks = wslib.assets.list_assets("notebook")
-
list_asset_types(raw=False)
This function lists all available asset types.
The function takes the following parameter:
raw
: (Optional) Returns the full set of metadata. By default, the parameter isFalse
and only a subset of the properties is returned.
-
list_datasource_types(raw=False)
This function lists all available data source types.
The function takes the following parameter:
raw
: (Optional) Returns the full set of metadata. By default, the parameter isFalse
and only a subset of the properties is returned.
-
get_asset(name_or_item, asset_type=None, raw=False)
The function returns the metadata of an asset.
The function takes the following parameters:
name_or_item
: (Required) The name of the asset or an item like those returned bylist_assets()
asset_type
: (Optional) The type of the asset. If the parametername_or_item
contains a string for the name of the asset, settingasset_type
is required.-
raw
: (Optional) Returns the full set of metadata. By default, the parameter isFalse
and only a subset of the properties is returned.Example of using the
list_assets
andget_asset
functions:notebooks = wslib.assets.list_assets('notebook') wslib.show(notebooks) notebook = wslib.assets.get_asset(notebooks[0]) wslib.show(notebook)
-
get_connection(name_or_item, with_datasourcetype=False, raw=False)
This function returns the metadata of a connection.
The function takes the following parameters:
name_or_item
: (Required) The name of the connection or an item like those returned bylist_connections()
with_datasourcetype
: (Optional) Returns additional information about the data source type of the connection.raw
: (Optional) Returns the full set of metadata. By default, the parameter isFalse
and only a subset of the properties is returned.
-
get_connected_data(name_or_item, with_datasourcetype=False, raw=False)
This function returns the metadata of a connected data asset.
The function takes the following parameters:
name_or_item
: (Required) The name of the connected data asset or an item like those returned bylist_connected_data()
with_datasourcetype
: (Optional) Returns additional information about the data source type of the associated connected data asset.raw
: (Optional) Returns the full set of metadata. By default, the parameter isFalse
and only a subset of the properties is returned.
-
get_stored_data(name_or_item, raw=False)
This function returns the metadata of a stored data asset.
The function takes the following parameters:
name_or_item
: (Required) The name of the stored data asset or an item like those returned bylist_stored_data()
raw
: (Optional) Returns the full set of metadata. By default, the parameter isFalse
and only a subset of the properties is returned.
-
list_attachments(name_or_item_or_asset, asset_type=None, raw=False)
This function returns a list of the attachments of an asset.
The function takes the following parameters:
name_or_item_or_asset
: (Required) The name of the asset or an item like those returned bylist_stored_data()
orget_asset()
.asset_type
: (Optional) The type of the asset. It defaults to typedata_asset
.-
raw
: (Optional) Returns the full set of metadata. By default, the parameter isFalse
and only a subset of the properties is returned.Example of using the
list_attachments
function to read an attachment of a stored data asset:assets = wslib.list_stored_data() wslib.show(assets) asset = assets[0] attachments = wslib.assets.list_attachments(asset) wslib.show(attachments) buffer = wslib.load_data(asset, attachments[0])
Learn more
For examples of how to use some of the functions provided by the library in a notebook, see Working with ibm-watson-studio-lib.
Parent topic: Using ibm-watson-studio-lib