Examples of Flight service with R
You can use the Flight Service and the Apache Arrow Flight protocol to read and write data in a project or space. In R, you do this by calling the open source Python pyarrow
library to invoke the Flight service by using the R reticulate
library. The library provides an R interface to Python modules, classes, and functions.
To read or write data using R, you can:
-
Use code that is generated for you to load data from a selected project asset to a notebook. This generated code uses the
itc_utils
library, which wraps calls to the open source Pythonpyarrow
library, improving code readablity while reducing code size. -
Write your own code to read and write data in a project or space using the open source Python
pyarrow
library to invoke the Flight service. You need to write your own code in the following situations:- The generated code requires changes, for example to use in a production environment
- The functionality to add generated code is not available for the asset
- The tool doesn't support adding generated code
For details about calling Python from R see:
For details on the Flight client API from open-source Flight Arrow R, see Connecting to Flight RPC Servers.
The following sections provide more details about how you can specify the data source and interactive properties for read or write requests using flight data requests.
For details on the data request syntax and properties, see Flight data requests.
You can use the following data request examples to learn how to specify the data source and interactive properties for data read or write requests in R. The properties vary depending on the data source. These examples use reticulate
to create Python dictionaries.
Example of a schema and table request using a connection ID (only in projects without Git integration)
data_request = dict(
"asset_id"= "ASSET_ID",
"project_id|space_id|catalog_id" = "ID",
"num_partitions"= NUM, # Optional
"batch_size"= SIZE, # Optional
"interaction_properties"= dict(
"schema_name"= "SCHEMA",
"table_name"= "TABLE"
),
"fields"= list("FIELDS") # Optional
)
Example of a schema and table request using a connection name
data_request = dict(
"connection_name"= "CONNECTION",
"batch_size"= NUM, # Optional
"interaction_properties"= dict(
"schema_name"= "SCHEMA",
"table_name"= "TABLE",
"row_limit"= NUM
),
"fields"= list("FIELDS") # Optional
)
Example of an SQL request using a connection ID (only in projects without Git integration)
data_request = dict(
"asset_id"= "ASSET_ID",
"project_id|space_id|catalog_id" = "ID",
"num_partitions"= NUM, # Optional
"batch_size"= SIZE, # Optional
"interaction_properties"= dict(
"select_statement"= "SQL" )
)
Example of an SQL request using a connection name
data_request = dict(
"connection_name"= "CONNECTION",
"num_partitions"= NUM, # Optional
"batch_size"= SIZE, # Optional
"interaction_properties"= dict(
"select_statement"= "SELECT ... FROM <schema>.<table> WHERE ..."
)
)
Example of a data file request using the asset ID (only in projects without Git integration)
data_request = dict(
"asset_id"= "ASSET_ID",
"project_id|space_id|catalog_id" = "ID",
"num_partitions"= NUM, # Optional
)
Example of a data asset request using the asset name
data_request = dict(
"data_name"= "DATA ASSET",
"interaction_properties"= dict(
"infer_schema"= "true",
"infer_as_varchar"= "false"
)
)
The "connection_properties"
and "interaction_properties"
properties differ depending on the connector. See IBM Watson Data API Datasource types for details.
Code samples using flight data requests
The following code examples show how the Flight service can be invoked within R notebooks. These examples use the R library reticulate
and the Python libraries pyarrow
and itc_utils
. itc_utils
is pre-installed in all notebook and RStudio runtime environments provided by IBM for use in these environments. For more details, see Using the Flight service in R notebooks.
Example to read data from a table in a schema from a connection that is accessed by using the connection name
# Schema and Table example using connection name
library("reticulate")
pa <- import("pyarrow")
library("arrow")
itcfs <- import("itc_utils.flight_service")
client <- itcfs$get_flight_client()
data_request = dict(
"connection_name"= "CONNECTION",
"batch_size"= NUM, # Optional
"interaction_properties"= dict(
"schema_name"= "SCHEMA",
"table_name"= "TABLE",
"row_limit"= NUM
),
"fields"= list("FIELDS") # Optional
)
flightInfo <- itcfs$get_flight_info(client, data_request=data_request)
df <- as.data.frame(itcfs$read_tables(client, flightInfo, timeout=240))
head(df)
Example to read data from an SQL connection that is accessed using the connection name
library("reticulate")
pa <- import("pyarrow")
library("arrow")
itcfs <- import("itc_utils.flight_service")
client <- itcfs$get_flight_client()
data_request = dict(
"connection_name"= "CONNECTION",
"num_partitions"= NUM, # Optional
"batch_size"= SIZE, # Optional
"interaction_properties"= dict(
"select_statement"= "SELECT * FROM SCHEMA.TABLE WHERE FIELD = 'field'"
)
)
flightInfo <- itcfs$get_flight_info(client, nb_data_request=data_request)
df <- as.data.frame(itcfs$read_tables(client, flightInfo, timeout=240))
head(df)
The following code snippets show samples of how to use custom created data requests when writing data back to a data source:
Example to write data to an existing database table or create a new table if none exists:
library("reticulate")
pa <- import("pyarrow")
library("arrow")
itcfs <- import("itc_utils.flight_service")
client <- itcfs$get_flight_client()
data_request = dict(
"connection_name"= "CONNECTION",
"interaction_properties"= dict(
"schema_name"= "SCHEMA",
"table_name"= "TABLE"
)
)
itcfs$write_dataframe(DATA, nb_data_request = data_request)
Example to replace all of the data in an existing table:
library("reticulate")
pa <- import("pyarrow")
library("arrow")
itcfs <- import("itc_utils.flight_service")
client <- itcfs$get_flight_client()
data_request = dict(
"connection_name"= "CONNECTION",
"interaction_properties"= dict(
"schema_name"= "SCHEMA",
"table_name"= "TABLE",
"table_action"= "truncate",
"write_mode"= "insert"
)
)
itcfs$write_dataframe(DATA, nb_data_request = data_request)
Learn more
- Flight data requests
- Connecting to Flight RPC Servers
- Calling Python from R
- Apache Arrow R Cookbook
- Using the Flight service in R notebooks
Parent topic: Using the Flight service in your applications