Data load support

You can add automatically generated code to load data from project data assets to a notebook cell. The asset type can be a file or a database connection.

By clicking in an empty code cell in your notebook, clicking the Code snippets icon (the Code snippets icon) from the notebook toolbar, selecting Read data and an asset from the project, you can:

  • Insert the data source access credentials. This capability is available for data assets, connections and connected data assets. With the credentials, you can write your own code to access the connection or connected data asset and load the data into data structures of your choice in your notebook.

    If the credentials option is not available for the data assets that you uploaded to the project, you can access them on the mounted directory at /project_data/data_asset/.

  • Generate code that is added to the notebook cell. The inserted code serves as a quick start to allow you to easily begin working with a data set or connection. For production systems, you should carefully review the inserted code to determine if you should write your own code that better meets your needs and performance requirements.

    The function to generate code uses the Flight service based on Apache Arrow Flight to communicate with a database connection or connected data asset (data accessible through a connection) when loading data into a data structure. This eliminates the need to use ODBC or JDBC drivers as is used in the legacy options to generate code. Although you can still use the legacy options to load data to a data structure, these options are deprecated and are tagged as such.

    When you run the code cell, the data is accessed and loaded into the data structure you selected.

    Note:

    If you still use the legacy options, the functionality is disabled if:

    • The connection uses a secure gateway link
    • The connection credentials are stored in vaults

    Rather use the options to generate code that leverage the Flight service.

The following tables show you which data source connections (file types and database connections) support the option to generate code. The options for generating code vary depending on the data source, the notebook coding language, and the notebook runtime compute.

Supported files types

Table 1. Supported file types
Data source Notebook coding language Compute engine type Available support to load data
- CSV/delimited files
- JSON files
- Excel files (.xls, .xlsx, .XLSM)
- SAS files
Python Anaconda Python distribution Load data into pandasDataFrame
With Spark Load data into pandasDataFrame and sparkSessionDataFrame
With Hadoop Load data into pandasDataFrame and sparkSessionDataFrame
R Anaconda R distribution Load data into R data frame
With Spark Load data into R data frame and sparkSessionDataFrame
With Hadoop Load data into R data frame and sparkSessionDataFrame
With Hadoop Load data into sparkSessionDataFrame

Supported database connections

The listed database connections all use the Flight service to communicate with the database connection or connected data asset (data accessible through a connection) when loading data into a data structure. If the connection supports using the legacy functionality, the option to load data is tagged as being deprecated.

Table 2. Supported database connections
Database source Notebook coding language Compute engine type Available support to load data
- Watson Query
- IBM Cloud Databases for PostgreSQL
- IBM Cloud Object storage
- IBM Db2
- IBM Db2 Big SQL
- IBM DB2 for i
- IBM DB2 for z/OS
- IBM Informix
- IBM DB2 on Cloud
- IBM Db2 Warehouse
- Microsoft SQL Server
- Oracle
- PostgreSQL
- Netezza Performance Server
Python Anaconda Python distribution Load data into pandasDataFrame
With Spark Load data into pandasDataFrame and sparkSessionDataFrame
With Hadoop Load data into pandasDataFrame, ibmdbpy, sparkSessionDataFrame and sqlContext
R Anaconda R distribution Load data into R data frame
With Spark Load data into R data frame and sparkSessionDataFrame
With Hadoop Load data into R data frame, ibmdbr, sparkSessionDataFrame and sqlContext
Python Anaconda Python distribution Load data into pandasDataFrame
With Spark Load data into pandasDataFrame and sparkSessionDataFrame
With Hadoop Load data into pandasDataFrame, ibmdbpy, sparkSessionDataFrame and sqlContext
R Anaconda R distribution Load data into R data frame
With Spark Load data into R data frame and sparkSessionDataFrame
With Hadoop Load data into R data frame, ibmdbr, sparkSessionDataFrame and sqlContext
- Amazon Redshift
- Amazon RDS for MySQL
- Amazon S3
- Apache Casandra
- Apache Derby
- Apache HDFS
- IBM Cloud Databases for MySQL
- Data Virtualization Manager for z/OS
- IBM Cloud Databases for DataStax
- Exasol
- Generic JDBC
- Generic S3 connection
- IBM Cloud Databases for MongoDB
- Google Cloud Storage
- HDFS via Execution Engine for Hadoop
- Hive via Execution Engine for Hadoop
- Apache Hive
- HTTP
- IBM Cognos Analytics
- Looker
- Microsoft Azure Cosmos DB
- Azure Data Lake Storage
- Microsoft Azure File Storage
- MinIO
- MongoDB
- MySQL
- OData
- Planning Analytics
- Salesforce.com
- SAP HANA
- SAP OData
- Snowflake
- Storage Volume (formerly Mounted Volume)
- Teradata
- SingleStoreDB
Python Anaconda Python distribution Load data into pandasDataFrame
With Spark Load data into pandasDataFrame and sparkSessionDataFrame
With Hadoop No data load support
R Anaconda R distribution Load data into R data frame
With Spark Load data into R data frame and sparkSessionDataFrame
With Hadoop No data load support

Parent topic: Loading and accessing data in a notebook