Data load support

The Insert to code function is available for project data assets in Jupyter notebooks when you click the Find and Add Data icon (Shows the find data icon) and select an asset in the notebook sidebar. The asset type can a file or a database connection.

By clicking in an empty code cell in your notebook and then clicking the Insert to code link below the asset name, you can select to:

  • Insert the data source access credentials. This capability is available for data assets, connections and connected data assets. With the credentials, you can write your own code to access the connection or connected data asset and load the data into data structures of your choice in your notebook.

    If the credentials option is not available for the data assets that you uploaded to the project, you can access them on the mounted directory at /project_data/data_asset/.

  • Generate code that is added to the notebook cell. The inserted code serves as a quick start to allow you to easily begin working with a data set or connection. For production systems, you should carefully review the inserted code to determine if you should write your own code that better meets your needs and performance requirements.

    The function to generate code uses the Flight service based on Apache Arrow Flight to communicate with a database connection or connected data asset (data accessible through a connection) when loading data into a data structure. This eliminates the need to use ODBC or JDBC drivers to load data that were used in the former Insert to code options. Although you can still use the legacy Insert to code options to load data to a data structure, these options are deprecated and are tagged as such.

    When you run the code cell, the data is accessed and loaded into the data structure you selected.

    Note: If you still use the legacy Insert to code function, the Insert to code function is disabled if:

    • The connection uses a secure gateway link
    • The connection credentials are stored in vault

    Rather use the Insert to code function that leverages the Flight service.

 

The following tables show you which data source connections (file types and database connections) support the option to generate code. The Insert to code function options for generating code vary depending on the data source, the notebook coding language, and the notebook runtime compute.

Supported files types

 

Data source Notebook coding language Compute engine type Available support to load data
- CSV/delimited files
- JSON files
- Excel files (.xls, .xlsx, .XLSM)
- SAS files
Python Anaconda Python distribution Load data into pandasDataFrame
With Spark Load data into pandasDataFrame and sparkSessionDataFrame
With Hadoop Load data into pandasDataFrame and sparkSessionDataFrame
R Anaconda R distribution Load data into R data frame
With Spark Load data into R data frame and sparkSessionDataFrame
With Hadoop Load data into R data frame and sparkSessionDataFrame
Scala With Spark Load data into sparkSessionDataFrame
With Hadoop Load data into sparkSessionDataFrame

 

Supported database connections

The listed database connections all use the Flight service to communicate with the database connection or connected data asset (data accessible through a connection) when loading data into a data structure. If the connection is supported by the legacy Insert to code function, the option to load data is tagged as being deprecated.

Database source Notebook coding language Compute engine type Available support to load data
- Watson Query
- IBM Cloud Databases for PostgreSQL
- IBM Cloud Object storage
- IBM Db2
- IBM Db2 Big SQL
- IBM DB2 Event Store
- IBM DB2 for i
- IBM DB2 for z/OS
- IBM DB2 Hosted
- IBM Informix
- IBM DB2 on Cloud
- IBM Db2 Warehouse
- Microsoft SQL Server
- Netezza Performance Server
- Oracle
- PostgreSQL
Python Anaconda Python distribution Load data into pandasDataFrame
With Spark Load data into pandasDataFrame and sparkSessionDataFrame
With Hadoop Load data into pandasDataFrame, ibmdbpy, sparkSessionDataFrame and sqlContext
R Anaconda R distribution Load data into R data frame
With Spark Load data into R data frame and sparkSessionDataFrame
With Hadoop Load data into R data frame, ibmdbr, sparkSessionDataFrame and sqlContext
Scala With Spark Load data into sparkSessionDataFrame
With Hadoop Load data into sparkSessionDataFrame and sqlContext
Python Anaconda Python distribution Load data into pandasDataFrame
With Spark Load data into pandasDataFrame and sparkSessionDataFrame
With Hadoop Load data into pandasDataFrame, ibmdbpy, sparkSessionDataFrame and sqlContext
R Anaconda R distribution Load data into R data frame
With Spark Load data into R data frame and sparkSessionDataFrame
With Hadoop Load data into R data frame, ibmdbr, sparkSessionDataFrame and sqlContext
Scala With Spark No data load support
With Hadoop No data load support
- Amazon Redshift
- Amazon RDS for MySQL
- Amazon S3
- Apache Casandra
- Apache Derby
- Apache HDFS
- IBM Cloud Compose for MySQL
- Data Virtualization Manager for z/OS
- IBM Cloud Databases for DataStax
- IBM Cloud Databases for MongoDB
- Google Cloud Storage
- HDFS via Execution Engine for Hadoop
- Hive via Executuion Engine for Hadoop
- HTTP
- IBM Cognos Analytics
- Looker
- Microsoft Azure Cosmos DB
- Microsoft Azure Data Lake Store
- Microsoft Azure File Storage
- MinIO
- MongoDB
- MySQL
- Salesforce.com
- SAP HANA
- SAP OData
- Snowflake
- Storage Volume (formerly Mounted Volume)
- Teradata
- SingleStoreDB
Python Anaconda Python distribution Load data into pandasDataFrame
With Spark Load data into pandasDataFrame and sparkSessionDataFrame
With Hadoop No data load support
R Anaconda R distribution Load data into R data frame
With Spark Load data into R data frame and sparkSessionDataFrame
With Hadoop No data load support
Scala With Spark Load data into sparkSessionDataFrame
With Hadoop No data load support

Parent topic: Loading and accessing data in a notebook