Loading and accessing data in a notebook (Watson Studio)

You can integrate data into notebooks by accessing the data from a local file, from free data sets, or from a data source connection. You load that data into a data structure or container in the notebook, for example, a pandas.DataFrame, numpy.array, Spark RDD, or Spark DataFrame.

To work with data in a notebook, you can choose between the following options:

Recommended methods for adding data to your notebook
Option Recommended method Requirements Details
Add data from a file on your local system Add a Code snippet that loads your data The file must exist as an asset in your project Add a file from your local system to your project and then Use a code snippet to load the data
Load data from a data source connection Add a Code snippet that loads your data The connection must exist as an asset in your project Add a connection to your project and then Add a code snippet that loads the data from your data source connection
Access project assets and metadata programmatically Use ibm-watson-studio-lib library functions The data source must exist as a project asset Use the ibm-watson-studio-lib library to interact with data assets
Generate your own code to read or write data Use the Flight Service and the Apache Arrow Flight protocol to read from and write to data assets in a project The data asset must exist in your project Access data sources by using the Flight service in Python notebooks or the Flight service in R notebooks
Create and use feature store data Use assetframe-lib library functions. The data asset must exist in your project Use the assetframe-lib library for Python to create and use feature store data.
Access data using an API function or an operating system command For example, use wget. N/A Access data using an API function or an operating system command
Important: Make sure that the environment in which the notebook is started has enough memory to store the data that you load to the notebook. The environment must have significantly more memory than the total size of the data that is loaded to the notebook. Some data frameworks, like pandas, can hold multiple copies of the data in memory.

Adding a file from your local system

To add a file from your local system to your project by using the Jupyterlab notebook editor:

  1. Open your notebook in edit mode.
  2. From the toolbar, click the Upload asset to project icon (Shows the Upload asset to project icon) and add your file.
Tip: You can also drag the file into your notebook sidebar.

Loading data from files

Prerequisites The file must exist as an asset in your project. For details, see Adding a file from your local system.

To load data from a project file to your notebook:

  1. Open your notebook in edit mode.
  2. Click the Code snippets icon (the Code snippets icon), click Read data, and then select the data file from your project. If you want to change your selection, use Edit icon.
  3. From the Load as drop-down list, select the load option that you prefer.
  4. Click in an empty code cell in your notebook and then click Insert code to cell to insert the generated code. Alternatively, click to copy the generated code to the clipboard and then paste the code into your notebook.

The generated code serves as a quick start to begin working with a data set. For production systems, carefully review the inserted code to determine whether to write your own code that better meets your needs.

To learn which data structures are generated for which notebook language and data format, see Data load support.

Loading data from data source connections

Prerequisites Before you can load data from an IBM data service or from an external data source, you must create or add a connection to your project. See Adding connections to projects.

For a Planning Analytics connection, see Adding data from a Planning Analytics connection.

To load data from an existing data source connection into a data structure in your notebook:

  1. Open your notebook in edit mode.
  2. Click the Code snippets icon (the Code snippets icon), click Read data, and then select the data source connection from your project.
  3. Select the schema and choose a table. If you want to change your selection, use Edit icon.
  4. Select the load option.
  5. Click in an empty code cell in your notebook and then insert code to the cell. Alternatively, click to copy the generated code to the clipboard and then paste the code into your notebook.
  6. If necessary, enter your personal credentials for locked data connections that are marked with a key icon (the key symbol for connections with personal credentials). This is a one-time step that permanently unlocks the connection for you. After you unlock the connection, the key icon is no longer displayed. For more information, see Adding connections to projects.

The generated code serves as a quick start to begin working with a connection. For production systems, carefully review the inserted code to determine whether to write your own code that better meets your needs.

To learn which data structures are generated for which notebook language and data format, see Data load support.

Use an API function or an operating system command to access the data

You can use API functions or operating system commands in your notebook to access data, for example, the wget command to access data by using the HTTP, HTTPS or FTP protocols.

For reference information about the API, see Watson Data API.

Parent topic: Notebooks and scripts