Loading data through generated code snippets

You can add automatically generated code to load data from project data assets to a notebook cell. The asset type can be a file or a database connection.

By clicking in an empty code cell in your notebook, clicking the Code snippets icon (the Code snippets icon) from the notebook toolbar, selecting Read data and an asset from the project, you can:

  • Insert the data source access credentials. This capability is available for connections and connected data assets. With the credentials, you can write your own code to access the connection or connected data asset and load the data into data structures of your choice in your notebook.

    If the credentials option is not available for the data assets that you uploaded to the project, you can access them on the mounted directory at /project_data/data_asset/.

  • Generate code that is added to the notebook cell. The inserted code serves as a quick start to allow you to easily begin working with a data set or connection. For production systems, you should carefully review the inserted code to determine if you should write your own code that better meets your needs and performance requirements.

    The function to generate code uses the Flight service based on Apache Arrow Flight to communicate with a database connection or connected data asset (data accessible through a connection) when loading data into a data structure. This eliminates the need to install ODBC or JDBC drivers in the notebook runtime. Installing these drivers is a required step only when you use the legacy code-generating option. Although you can still use the legacy option to load data to a data structure, such action is deprecated and is tagged as such.

    When you run the code cell, the data is accessed and loaded into the data structure you selected.

    Note:

    If you still use the legacy options, the functionality is disabled if:

    • The connection credentials are stored in vaults

    Rather use the options to generate code that leverage the Flight service.

The options for generating code vary depending on the data source, the notebook coding language, and the notebook runtime compute engine.

For details, refer to:

Note:

If you have Execution Engine for Apache Hadoop service provisioned, you can choose from additional runtime definitions for your notebook. As the result, some of the available data access options are different. For additional information, refer to Accessing data from notebooks that use runtime definitions based on Apache Hadoop.

Data loading options per compute engine type

Data loading options per compute engine type
Data loading options Anaconda Python distribution Python + Spark Anaconda R distribution R + Spark
Load data into a pandasDataFrame
Load data into a sparkSessionDataFrame
Load data into an R data frame

For Hadoop-based environments, refer to Data loading options per compute engine type in Hadoop-based environments.

Supported file types

  • CSV/delimited files
  • Excel files (.xls, .xlsx, .xlsm)
  • JSON files
  • SAS files

You can load all these file types into notebooks based on both Python and R programming languages. For applicable data loading options per compute engine type, refer to Data loading options per compute engine type.

For Hadoop-based environments, refer to Supported file types in Hadoop-based environments.

Supported database connections

The listed database connections all use the Flight service to communicate with the database connection or connected data asset (data accessible through a connection) when loading data into a data structure. If the connection supports using the legacy functionality, the option to load data is tagged as being deprecated.

All the listed connections are supported in environments based on both Python and R programming languages:

Note: For custom JDBC connectors, the name, details, and properties of the connection are defined by the administrator who creates the connector.

For applicable data loading options per compute engine type, refer to Data loading options per compute engine type.

For Hadoop-based environments, refer to Supported database connections in Hadoop-based environments.

Accessing data from notebooks that use runtime definitions based on Apache Hadoop

Data loading options per compute engine type in Hadoop-based environments

Data loading options per compute engine type in Hadoop-based environments
Data loading options Python + Hadoop R + Hadoop
Load data into a pandasDataFrame
Load data into a sparkSessionDataFrame
Load data into an R data frame

Supported file types in Hadoop-based environments

You can load all these file types into notebooks that use runtime definitions based on Apache Hadoop and both Python and R programming languages:

  • CSV/delimited files
  • Excel files (.xls, .xlsx, .xlsm)
  • JSON files
  • SAS files

For applicable data loading options per compute engine type, refer to Data loading options per compute engine type in Hadoop-based environments.

Supported database connections in Hadoop-based environments

The listed database connections all use the Flight service to communicate with the database connection or connected data asset (data accessible through a connection) when loading data into a data structure. If the connection supports using the legacy functionality, the option to load data is tagged as being deprecated.

All the listed connections are supported in environments based on both Python and R programming languages:

For applicable data loading options per compute engine type, refer to Data loading options per compute engine type in Hadoop-based environments.

Learn more

Parent topic: Loading and accessing data in a notebook