Loading and accessing data in a notebook (Watson Studio)

You can integrate data into notebooks by accessing the data from a local file, from free data sets, or from a data source connection. You load that data into a data structure or container in the notebook, for example, a pandas.DataFrame, numpy.array, Spark RDD, or Spark DataFrame.

To load data into your own notebooks, you can choose one of these options:

Add a file from your local system
Load data from a data source connection
Use the ibm-watson-studio-lib library to interact with project assets:
- For Python
- For R
Use an API function or operating system command to access the data

Important: Make sure that the environment in which the notebook is started has enough memory to store the data that you load to the notebook. Oftentimes this means that the environment must have significantly more memory than the total size of the data loaded to the notebook. The reason is that some data frameworks, like pandas, can hold multiple copies of the data in memory.

Load data from local files

To access data from a local file, you can load the file from within a notebook, or first load the file into your project. From your notebook, you add automatically generated code to access the data by using the Insert to code function. The inserted code serves as a quick start to allow you to easily begin working with data sets.

The Insert to code function supports file types such as CSV, JSON and XLSX. To learn which data structures are generated for which notebook language, see Data load support. For file types that are not supported, you can only insert the file credentials. With the credentials, you can write your own code to load the file data into a DataFrame or other data structure in a notebook cell.

To add a file from your local system to your notebook:

Click the Find and Add Data icon (), and then browse a data file or drag it into your notebook sidebar.
Click in an empty code cell in your notebook and then click the Insert to code link below the file.

Load data from data source connections

You must create a connection to an IBM data service or an external data source before you can add data from that data source to your notebook. See Adding connections to projects.

The Insert to code function supports some database connections. To learn which database connections are supported, see Data load support. For database connections that are not supported, you can only insert the database connection credentials. With the credentials, you can write your own code to load the data into a DataFrame or other data structure in a notebook cell.

To load data from an existing data source connection into a data structure in your notebook:

Open the notebook in edit mode.
Click in an empty code cell, click Find and Add Data, and then click the Connections tab to see your connections.
Click Insert to code under the connection name.
If necessary, enter your personal credentials for locked data connections that are marked with a key icon (). This is a one-time step that permanently unlocks the connection for you. After you have unlocked the connection, the key icon is no longer displayed. See Adding connections to projects.
If the connection is supported, choose how to load the data to your notebook. Select the schema and choose a table.
If the connection is not supported, load the credentials and open the database connection that references your credentials. Write code to load the data.

For a Planning Analytics connection, see Adding data from a Planning Analytics connection.

Use an API function or operating system command to access the data

You can use API functions or operating system commands in your notebook to access data, for example, the Wget command to access data by using the HTTP, HTTPS or FTP protocols.

Parent topic: Coding and running notebooks