Data load support
The Insert to code
function is available for project data assets in Jupyter notebooks when you click the Find and Add Data icon () and select an asset in the notebook sidebar. The asset type can a file or a database connection.
By clicking in an empty code cell in your notebook and then clicking the Insert to code link below the asset name, you can select to:
-
Insert the data source access credentials. This capability is available for data assets, connections and connected data assets. With the credentials, you can write your own code to access the connection or connected data asset and load the data into data structures of your choice in your notebook.
If the credentials option is not available for the data assets that you uploaded to the project, you can access them on the mounted directory at
/project_data/data_asset/
. -
Generate code that is added to the notebook cell. The inserted code serves as a quick start to allow you to easily begin working with a data set or connection. For production systems, you should carefully review the inserted code to determine if you should write your own code that better meets your needs and performance requirements.
The function to generate code uses the Flight service based on Apache Arrow Flight to communicate with a database connection or connected data asset (data accessible through a connection) when loading data into a data structure. This eliminates the need to use ODBC or JDBC drivers to load data that were used in the former
Insert to code
options. Although you can still use the legacyInsert to code
options to load data to a data structure, these options are deprecated and are tagged as such.When you run the code cell, the data is accessed and loaded into the data structure you selected.
Note: If you still use the legacy
Insert to code
function, theInsert to code
function is disabled if:- The connection uses a secure gateway link
- The connection credentials are stored in vault
Rather use the
Insert to code
function that leverages the Flight service.
The following tables show you which data source connections (file types and database connections) support the option to generate code. The Insert to code
function options for generating code vary depending on the data source, the
notebook coding language, and the notebook runtime compute.
Supported files types
Data source | Notebook coding language | Compute engine type | Available support to load data |
---|---|---|---|
- CSV/delimited files - JSON files - Excel files (.xls, .xlsx, .XLSM) - SAS files |
|||
Python | Anaconda Python distribution | Load data into pandasDataFrame | |
With Spark | Load data into pandasDataFrame and sparkSessionDataFrame | ||
With Hadoop | Load data into pandasDataFrame and sparkSessionDataFrame | ||
R | Anaconda R distribution | Load data into R data frame | |
With Spark | Load data into R data frame and sparkSessionDataFrame | ||
With Hadoop | Load data into R data frame and sparkSessionDataFrame | ||
Scala | With Spark | Load data into sparkSessionDataFrame | |
With Hadoop | Load data into sparkSessionDataFrame |
Supported database connections
The listed database connections all use the Flight service to communicate with the database connection or connected data asset (data accessible through a connection) when loading data into a data structure. If the connection is supported by
the legacy Insert to code
function, the option to load data is tagged as being deprecated.
Database source | Notebook coding language | Compute engine type | Available support to load data |
---|---|---|---|
- Watson Query - IBM Cloud Databases for PostgreSQL - IBM Cloud Object storage - IBM Db2 - IBM Db2 Big SQL - IBM DB2 Event Store - IBM DB2 for i - IBM DB2 for z/OS - IBM DB2 Hosted - IBM Informix - IBM DB2 on Cloud - IBM Db2 Warehouse - Microsoft SQL Server - Netezza Performance Server - Oracle - PostgreSQL |
|||
Python | Anaconda Python distribution | Load data into pandasDataFrame | |
With Spark | Load data into pandasDataFrame and sparkSessionDataFrame | ||
With Hadoop | Load data into pandasDataFrame, ibmdbpy, sparkSessionDataFrame and sqlContext | ||
R | Anaconda R distribution | Load data into R data frame | |
With Spark | Load data into R data frame and sparkSessionDataFrame | ||
With Hadoop | Load data into R data frame, ibmdbr, sparkSessionDataFrame and sqlContext | ||
Scala | With Spark | Load data into sparkSessionDataFrame | |
With Hadoop | Load data into sparkSessionDataFrame and sqlContext | ||
Python | Anaconda Python distribution | Load data into pandasDataFrame | |
With Spark | Load data into pandasDataFrame and sparkSessionDataFrame | ||
With Hadoop | Load data into pandasDataFrame, ibmdbpy, sparkSessionDataFrame and sqlContext | ||
R | Anaconda R distribution | Load data into R data frame | |
With Spark | Load data into R data frame and sparkSessionDataFrame | ||
With Hadoop | Load data into R data frame, ibmdbr, sparkSessionDataFrame and sqlContext | ||
Scala | With Spark | No data load support | |
With Hadoop | No data load support | ||
- Amazon Redshift - Amazon RDS for MySQL - Amazon S3 - Apache Casandra - Apache Derby - Apache HDFS - IBM Cloud Compose for MySQL - Data Virtualization Manager for z/OS - IBM Cloud Databases for DataStax - IBM Cloud Databases for MongoDB - Google Cloud Storage - HDFS via Execution Engine for Hadoop - Hive via Executuion Engine for Hadoop - HTTP - IBM Cognos Analytics - Looker - Microsoft Azure Cosmos DB - Microsoft Azure Data Lake Store - Microsoft Azure File Storage - MinIO - MongoDB - MySQL - Salesforce.com - SAP HANA - SAP OData - Snowflake - Storage Volume (formerly Mounted Volume) - Teradata - SingleStoreDB |
|||
Python | Anaconda Python distribution | Load data into pandasDataFrame | |
With Spark | Load data into pandasDataFrame and sparkSessionDataFrame | ||
With Hadoop | No data load support | ||
R | Anaconda R distribution | Load data into R data frame | |
With Spark | Load data into R data frame and sparkSessionDataFrame | ||
With Hadoop | No data load support | ||
Scala | With Spark | Load data into sparkSessionDataFrame | |
With Hadoop | No data load support |
Parent topic: Loading and accessing data in a notebook