IBM Cloud Pak® for Data Version 4.7 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.
Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.7 reaches end of support. For more information, see Upgrading IBM Software Hub in the IBM Software Hub Version 5.1 documentation.
Examples of Flight service with R
You can use the Flight Service and the Apache Arrow Flight protocol to read and write data in a project or space. In R, you do this by calling the open source Python pyarrow library to invoke the Flight service by using the R reticulate
library, which provides an R interface to Python modules, classes, and functions.
To read or write data using R, you can:
-
Use code that is generated for you to load data from a selected project asset to a notebook. This generated code uses the
itc_utilslibrary, which wraps calls to the open source Pythonpyarrowlibrary, improving code readablity while reducing code size. -
Write your own code to read and write data in a project or space using the open source Python
pyarrowlibrary to invoke the Flight service. You need to write your own code in the following situations:- The generated code requires changes, for example to use in a production environment
- The functionality to add generated code is not available for the asset
- The tool doesn't support adding generated code
For details about calling Python from R see:
For details on the Flight client API from open-source Flight Arrow R, see Connecting to Flight RPC Servers
The following sections provide more details about how you can specify the data source and interactive properties for read or write requests using flight data requests.
For details on the data request syntax and properties, see Flight data requests.
Examples of flight data requests
You can use the following data request examples to learn how to specify the data source and interactive properties for data read or write requests in R. The properties vary depending on the data source. These examples use reticulate to create Python dictionaries.
Example of a schema and table request using a connection ID (only in projects without Git integration):
data_request = dict(
"asset_id"= "ASSET_ID",
"project_id|space_id|catalog_id" = "ID",
"num_partitions"= NUM, # Optional
"batch_size"= SIZE, # Optional
"interaction_properties"= dict(
"schema_name"= "SCHEMA",
"table_name"= "TABLE"
),
"fields"= list("FIELDS") # Optional
)
Example of a schema and table request using a connection name:
data_request = dict(
"connection_name"= "CONNECTION",
"batch_size"= NUM, # Optional
"interaction_properties"= dict(
"schema_name"= "SCHEMA",
"table_name"= "TABLE",
"row_limit"= NUM
),
"fields"= list("FIELDS") # Optional
)
Example of an SQL request using a connection ID (only in projects without Git integration):
data_request = dict(
"asset_id"= "ASSET_ID",
"project_id|space_id|catalog_id" = "ID",
"num_partitions"= NUM, # Optional
"batch_size"= SIZE, # Optional
"interaction_properties"= dict(
"select_statement"= "SQL" )
)
Example of an SQL request using a connection name:
data_request = dict(
"connection_name"= "CONNECTION",
"num_partitions"= NUM, # Optional
"batch_size"= SIZE, # Optional
"interaction_properties"= dict(
"select_statement"= "SELECT ... FROM <schema>.<table> WHERE ..."
)
)
Example of a data file request using the asset ID (only in projects without Git integration):
data_request = dict(
"asset_id"= "ASSET_ID",
"project_id|space_id|catalog_id" = "ID",
"num_partitions"= NUM, # Optional
)
Example of a data asset request using the asset name:
data_request = dict(
"data_name"= "DATA ASSET",
"interaction_properties"= dict(
"infer_schema"= "true",
"infer_as_varchar"= "false"
)
)
The "connection_properties" and "interaction_properties" properties differ depending on the connector. See IBM Watson Data API Datasource types for details.
Code samples for reading data
The following code examples show how the Flight service can be invoked within R notebooks. These examples use the R library reticulate and the Python libraries pyarrow and itc_utils. itc_utils is pre-installed in all notebook and RStudio runtime environments provided by IBM for use in these environments. For more details, see Using the Flight service in R notebooks.
-
Read data from a table in a schema from a connection that is accessed by using the connection name:
# Schema and Table example using connection name library("reticulate") pa <- import("pyarrow") library("arrow") itcfs <- import("itc_utils.flight_service") client <- itcfs$get_flight_client() data_request = dict( "connection_name"= "CONNECTION", "batch_size"= NUM, # Optional "interaction_properties"= dict( "schema_name"= "SCHEMA", "table_name"= "TABLE", "row_limit"= NUM ), "fields"= list("FIELDS") # Optional ) flightInfo <- itcfs$get_flight_info(client, data_request=data_request) df <- as.data.frame(itcfs$read_tables(client, flightInfo, timeout=240)) head(df) -
Read data from an SQL connection that is accessed using the connection name:
library("reticulate") pa <- import("pyarrow") library("arrow") itcfs <- import("itc_utils.flight_service") client <- itcfs$get_flight_client() data_request = dict( "connection_name"= "CONNECTION", "num_partitions"= NUM, # Optional "batch_size"= SIZE, # Optional "interaction_properties"= dict( "select_statement"= "SELECT * FROM SCHEMA.TABLE WHERE FIELD = 'field'" ) ) flightInfo <- itcfs$get_flight_info(client, nb_data_request=data_request) df <- as.data.frame(itcfs$read_tables(client, flightInfo, timeout=240)) head(df)
Code samples for writing data
The following code snippets show samples of how to use custom created data requests when writing data back to a data source:
-
Write data to an existing database table or create a new table if none exists:
library("reticulate") pa <- import("pyarrow") library("arrow") itcfs <- import("itc_utils.flight_service") client <- itcfs$get_flight_client() data_request = dict( "connection_name"= "CONNECTION", "interaction_properties"= dict( "schema_name"= "SCHEMA", "table_name"= "TABLE" ) ) itcfs$write_dataframe(DATA, nb_data_request = data_request) -
Replace all of the data in an existing table:
library("reticulate") pa <- import("pyarrow") library("arrow") itcfs <- import("itc_utils.flight_service") client <- itcfs$get_flight_client() data_request = dict( "connection_name"= "CONNECTION", "interaction_properties"= dict( "schema_name"= "SCHEMA", "table_name"= "TABLE", "table_action"= "truncate", "write_mode"= "insert" ) ) itcfs$write_dataframe(DATA, nb_data_request = data_request)
Learn more
- Flight data requests
- Connecting to Flight RPC Servers
- Calling Python from R
- Apache Arrow R Cookbook
- Using the Flight service in R notebooks
Parent topic: Using the Flight service in your applications