Loading data through generated code snippets
You can add automatically generated code to load data from project data assets to a notebook cell. The asset type can be a file or a database connection.
By clicking in an empty code cell in your notebook, clicking the Code snippets icon () from the notebook toolbar,
selecting Read data and an asset from the project, you can:
-
Insert the data source access credentials. This capability is available for connections and connected data assets. With the credentials, you can write your own code to access the connection or connected data asset and load the data into data structures of your choice in your notebook.
If the credentials option is not available for the data assets that you uploaded to the project, you can access them on the mounted directory at
/project_data/data_asset/
. -
Generate code that is added to the notebook cell. The inserted code serves as a quick start to allow you to easily begin working with a data set or connection. For production systems, you should carefully review the inserted code to determine if you should write your own code that better meets your needs and performance requirements.
The function to generate code uses the Flight service based on Apache Arrow Flight to communicate with a database connection or connected data asset (data accessible through a connection) when loading data into a data structure. This eliminates the need to install ODBC or JDBC drivers in the notebook runtime. Installing these drivers is a required step only when you use the legacy code-generating option. Although you can still use the legacy option to load data to a data structure, such action is deprecated and is tagged as such.
When you run the code cell, the data is accessed and loaded into the data structure you selected.
Note:If you still use the legacy options, the functionality is disabled if:
- The connection credentials are stored in vaults
Rather use the options to generate code that leverage the Flight service.
The options for generating code vary depending on the data source, the notebook coding language, and the notebook runtime compute engine.
For details, refer to:
If you have Execution Engine for Apache Hadoop service provisioned, you can choose from additional runtime definitions for your notebook. As the result, some of the available data access options are different. For additional information, refer to Accessing data from notebooks that use runtime definitions based on Apache Hadoop.
Data loading options per compute engine type
Data loading options | Anaconda Python distribution | Python + Spark | Anaconda R distribution | R + Spark |
---|---|---|---|---|
Load data into a pandasDataFrame | ✓ | ✓ | ||
Load data into a sparkSessionDataFrame | ✓ | ✓ | ||
Load data into an R data frame | ✓ | ✓ |
For Hadoop-based environments, refer to Data loading options per compute engine type in Hadoop-based environments.
Supported file types
- CSV/delimited files
- Excel files (.xls, .xlsx, .xlsm)
- JSON files
- SAS files
You can load all these file types into notebooks based on both Python and R programming languages. For applicable data loading options per compute engine type, refer to Data loading options per compute engine type.
For Hadoop-based environments, refer to Supported file types in Hadoop-based environments.
Supported database connections
The listed database connections all use the Flight service to communicate with the database connection or connected data asset (data accessible through a connection) when loading data into a data structure. If the connection supports using the legacy functionality, the option to load data is tagged as being deprecated.
All the listed connections are supported in environments based on both Python and R programming languages:
- Amazon RDS for MySQL
- Amazon Redshift
- Amazon S3
- Apache Casandra
- Apache Derby
- Apache HDFS
- Apache Hive
- Azure Data Lake Storage
- Cloudant
- Custom JDBC connector
- Data Virtualization Manager for z/OS
- Exasol
- Generic JDBC
- Generic S3 connection
- Google Cloud Storage
- HDFS via Execution Engine for Hadoop
- Hive via Execution Engine for Hadoop
- HTTP
- IBM Cloud Databases for PostgreSQL
- PostgreSQL
- IBM Cloud Object storage
- IBM Cognos Analytics
- IBM Db2 Big SQL
- IBM DB2 for i
- IBM DB2 for z/OS
- IBM DB2 on Cloud
- IBM Db2 Warehouse
- IBM Db2
- IBM Informix
- watsonx.data
- Looker
- Microsoft Azure Cosmos DB
- Microsoft Azure File Storage
- Microsoft SQL Server
- MinIO
- MongoDB
- IBM Cloud Databases for MongoDB
- MySQL
- Netezza Performance Server
- OData
- Oracle
- Planning Analytics
- Presto
- Salesforce.com
- SAP HANA
- SAP OData
- SingleStoreDB
- Snowflake
- Storage Volume (formerly Mounted Volume)
- Teradata
For applicable data loading options per compute engine type, refer to Data loading options per compute engine type.
For Hadoop-based environments, refer to Supported database connections in Hadoop-based environments.
Accessing data from notebooks that use runtime definitions based on Apache Hadoop
Data loading options per compute engine type in Hadoop-based environments
Data loading options | Python + Hadoop | R + Hadoop |
---|---|---|
Load data into a pandasDataFrame | ✓ | |
Load data into a sparkSessionDataFrame | ✓ | ✓ |
Load data into an R data frame | ✓ |
Supported file types in Hadoop-based environments
You can load all these file types into notebooks that use runtime definitions based on Apache Hadoop and both Python and R programming languages:
- CSV/delimited files
- Excel files (.xls, .xlsx, .xlsm)
- JSON files
- SAS files
For applicable data loading options per compute engine type, refer to Data loading options per compute engine type in Hadoop-based environments.
Supported database connections in Hadoop-based environments
The listed database connections all use the Flight service to communicate with the database connection or connected data asset (data accessible through a connection) when loading data into a data structure. If the connection supports using the legacy functionality, the option to load data is tagged as being deprecated.
All the listed connections are supported in environments based on both Python and R programming languages:
- Cloudant
- IBM Cloud Object storage
- IBM Db2 Big SQL
- IBM DB2 for i
- IBM DB2 for z/OS
- IBM DB2 on Cloud
- IBM Db2 Warehouse
- IBM Db2
- IBM Informix
- Microsoft SQL Server
- Netezza Performance Server
- Oracle
- PostgreSQL
For applicable data loading options per compute engine type, refer to Data loading options per compute engine type in Hadoop-based environments.
Learn more
Parent topic: Loading and accessing data in a notebook