Data load support

You can add automatically generated code to load data from project data assets to a notebook cell. The asset type can be a file or a database connection.

By clicking in an empty code cell in your notebook, clicking the Code snippets icon () from the notebook toolbar, selecting Read data and an asset from the project, you can:

Insert the data source access credentials. This capability is available for data assets, connections and connected data assets. With the credentials, you can write your own code to access the connection or connected data asset and load the data into data structures of your choice in your notebook.

If the credentials option is not available for the data assets that you uploaded to the project, you can access them on the mounted directory at /project_data/data_asset/.
Generate code that is added to the notebook cell. The inserted code serves as a quick start to allow you to easily begin working with a data set or connection. For production systems, you should carefully review the inserted code to determine if you should write your own code that better meets your needs and performance requirements.

The function to generate code uses the Flight service based on Apache Arrow Flight to communicate with a database connection or connected data asset (data accessible through a connection) when loading data into a data structure. This eliminates the need to use ODBC or JDBC drivers as is used in the legacy options to generate code. Although you can still use the legacy options to load data to a data structure, these options are deprecated and are tagged as such.

When you run the code cell, the data is accessed and loaded into the data structure you selected.
Note:
If you still use the legacy options, the functionality is disabled if:
- The connection uses a secure gateway link
- The connection credentials are stored in vaults
Rather use the options to generate code that leverage the Flight service.

The following tables show you which data source connections (file types and database connections) support the option to generate code. The options for generating code vary depending on the data source, the notebook coding language, and the notebook runtime compute.

Supported files types

Table 1. Supported file types
Data source	Notebook coding language	Compute engine type	Available support to load data
- CSV/delimited files - JSON files - Excel files (.xls, .xlsx, .XLSM) - SAS files
	Python	Anaconda Python distribution	Load data into pandasDataFrame
		With Spark	Load data into pandasDataFrame and sparkSessionDataFrame
		With Hadoop	Load data into pandasDataFrame and sparkSessionDataFrame
	R	Anaconda R distribution	Load data into R data frame
		With Spark	Load data into R data frame and sparkSessionDataFrame
		With Hadoop	Load data into R data frame and sparkSessionDataFrame
		With Hadoop	Load data into sparkSessionDataFrame

Supported database connections

The listed database connections all use the Flight service to communicate with the database connection or connected data asset (data accessible through a connection) when loading data into a data structure. If the connection supports using the legacy functionality, the option to load data is tagged as being deprecated.

Table 2. Supported database connections
Database source	Notebook coding language	Compute engine type	Available support to load data
- Watson Query - IBM Cloud Databases for PostgreSQL - IBM Cloud Object storage - IBM Db2 - IBM Db2 Big SQL - IBM DB2 for i - IBM DB2 for z/OS - IBM Informix - IBM DB2 on Cloud - IBM Db2 Warehouse - Microsoft SQL Server - Oracle - PostgreSQL - Netezza Performance Server
	Python	Anaconda Python distribution	Load data into pandasDataFrame
		With Spark	Load data into pandasDataFrame and sparkSessionDataFrame
		With Hadoop	Load data into pandasDataFrame, ibmdbpy, sparkSessionDataFrame and sqlContext
	R	Anaconda R distribution	Load data into R data frame
		With Spark	Load data into R data frame and sparkSessionDataFrame
		With Hadoop	Load data into R data frame, ibmdbr, sparkSessionDataFrame and sqlContext
Cloudant
	Python	Anaconda Python distribution	Load data into pandasDataFrame
		With Spark	Load data into pandasDataFrame and sparkSessionDataFrame
		With Hadoop	Load data into pandasDataFrame, ibmdbpy, sparkSessionDataFrame and sqlContext
	R	Anaconda R distribution	Load data into R data frame
		With Spark	Load data into R data frame and sparkSessionDataFrame
		With Hadoop	Load data into R data frame, ibmdbr, sparkSessionDataFrame and sqlContext
- Amazon Redshift - Amazon RDS for MySQL - Amazon S3 - Apache Casandra - Apache Derby - Apache HDFS - IBM Cloud Databases for MySQL - Data Virtualization Manager for z/OS - IBM Cloud Databases for DataStax - Exasol - Generic JDBC - Generic S3 connection - IBM Cloud Databases for MongoDB - Google Cloud Storage - HDFS via Execution Engine for Hadoop - Hive via Execution Engine for Hadoop - Apache Hive - HTTP - IBM Cognos Analytics - Looker - Microsoft Azure Cosmos DB - Azure Data Lake Storage - Microsoft Azure File Storage - MinIO - MongoDB - MySQL - OData - Planning Analytics - Salesforce.com - SAP HANA - SAP OData - Snowflake - Storage Volume (formerly Mounted Volume) - Teradata - SingleStoreDB
	Python	Anaconda Python distribution	Load data into pandasDataFrame
		With Spark	Load data into pandasDataFrame and sparkSessionDataFrame
		With Hadoop	No data load support
	R	Anaconda R distribution	Load data into R data frame
		With Spark	Load data into R data frame and sparkSessionDataFrame
		With Hadoop	No data load support

Parent topic: Loading and accessing data in a notebook