Examples of Flight service with R

You can use the Flight Service and the Apache Arrow Flight protocol to read and write data in a project or space. In R, you do this by calling the open source Python pyarrow library to invoke the Flight service by using the R reticulate library. The library provides an R interface to Python modules, classes, and functions.

To read or write data using R, you can:

  • Use code that is generated for you to load data from a selected project asset to a notebook. This generated code uses the itc_utils library, which wraps calls to the open source Python pyarrow library, improving code readablity while reducing code size.

  • Write your own code to read and write data in a project or space using the open source Python pyarrow library to invoke the Flight service. You need to write your own code in the following situations:

    • The generated code requires changes, for example to use in a production environment
    • The functionality to add generated code is not available for the asset
    • The tool doesn't support adding generated code

For details about calling Python from R see:

For details on the Flight client API from open-source Flight Arrow R, see Connecting to Flight RPC Servers.

The following sections provide more details about how you can specify the data source and interactive properties for read or write requests using flight data requests.

For details on the data request syntax and properties, see Flight data requests.

You can use the following data request examples to learn how to specify the data source and interactive properties for data read or write requests in R. The properties vary depending on the data source. These examples use reticulate to create Python dictionaries.

Example of a schema and table request using a connection ID (only in projects without Git integration)

data_request = dict(
    "asset_id"= "ASSET_ID",
    "project_id|space_id|catalog_id" = "ID",
    "num_partitions"= NUM, # Optional
    "batch_size"= SIZE, # Optional
    "interaction_properties"= dict(
        "schema_name"= "SCHEMA",
        "table_name"= "TABLE"
        ),
    "fields"= list("FIELDS") # Optional
    )

Example of a schema and table request using a connection name

data_request = dict(
    "connection_name"= "CONNECTION",
    "batch_size"= NUM, # Optional
    "interaction_properties"= dict(
        "schema_name"= "SCHEMA",
        "table_name"= "TABLE",
        "row_limit"= NUM
        ),
    "fields"= list("FIELDS") # Optional
    )

Example of an SQL request using a connection ID (only in projects without Git integration)

data_request = dict(
    "asset_id"= "ASSET_ID",
    "project_id|space_id|catalog_id" = "ID",
    "num_partitions"= NUM, # Optional
    "batch_size"= SIZE, # Optional
    "interaction_properties"= dict(
        "select_statement"= "SQL" )
    )

Example of an SQL request using a connection name

data_request = dict(
    "connection_name"= "CONNECTION",
    "num_partitions"= NUM, # Optional
    "batch_size"= SIZE, # Optional
    "interaction_properties"= dict(
        "select_statement"= "SELECT ... FROM <schema>.<table> WHERE ..."
        )
    )

Example of a data file request using the asset ID (only in projects without Git integration)

data_request = dict(
    "asset_id"= "ASSET_ID",
    "project_id|space_id|catalog_id" = "ID",
    "num_partitions"= NUM, # Optional
    )

Example of a data asset request using the asset name

data_request = dict(
    "data_name"= "DATA ASSET",
    "interaction_properties"= dict(
        "infer_schema"= "true",
        "infer_as_varchar"= "false"
        )
    )

The "connection_properties" and "interaction_properties" properties differ depending on the connector. See IBM Watson Data API Datasource types for details.

Code samples using flight data requests

The following code examples show how the Flight service can be invoked within R notebooks. These examples use the R library reticulate and the Python libraries pyarrow and itc_utils. itc_utils is pre-installed in all notebook and RStudio runtime environments provided by IBM for use in these environments. For more details, see Using the Flight service in R notebooks.

Example to read data from a table in a schema from a connection that is accessed by using the connection name

# Schema and Table example using connection name
library("reticulate")
pa <- import("pyarrow")
library("arrow")

itcfs <- import("itc_utils.flight_service")
client <- itcfs$get_flight_client()

data_request = dict(
"connection_name"= "CONNECTION",
"batch_size"= NUM, # Optional
"interaction_properties"= dict(
    "schema_name"= "SCHEMA",
    "table_name"= "TABLE",
    "row_limit"= NUM
    ),
"fields"= list("FIELDS") # Optional
)

flightInfo <- itcfs$get_flight_info(client, data_request=data_request)
df <- as.data.frame(itcfs$read_tables(client, flightInfo, timeout=240))
head(df)

Example to read data from an SQL connection that is accessed using the connection name

library("reticulate")
pa <- import("pyarrow")
library("arrow")

itcfs <- import("itc_utils.flight_service")
client <- itcfs$get_flight_client()

data_request = dict(
"connection_name"= "CONNECTION",
"num_partitions"= NUM, # Optional
"batch_size"= SIZE, # Optional
"interaction_properties"= dict(
    "select_statement"= "SELECT * FROM SCHEMA.TABLE WHERE FIELD = 'field'"
    )
)

flightInfo <- itcfs$get_flight_info(client, nb_data_request=data_request)
df <- as.data.frame(itcfs$read_tables(client, flightInfo, timeout=240))
head(df)

The following code snippets show samples of how to use custom created data requests when writing data back to a data source:

Example to write data to an existing database table or create a new table if none exists:

library("reticulate")
pa <- import("pyarrow")
library("arrow")

itcfs <- import("itc_utils.flight_service")
client <- itcfs$get_flight_client()

data_request = dict(
"connection_name"= "CONNECTION",
"interaction_properties"= dict(
    "schema_name"= "SCHEMA",
    "table_name"= "TABLE"
    )
)

itcfs$write_dataframe(DATA, nb_data_request = data_request)

Example to replace all of the data in an existing table:

library("reticulate")
pa <- import("pyarrow")
library("arrow")

itcfs <- import("itc_utils.flight_service")
client <- itcfs$get_flight_client()

data_request = dict(
    "connection_name"= "CONNECTION",
    "interaction_properties"= dict(
        "schema_name"= "SCHEMA",
        "table_name"= "TABLE",
        "table_action"= "truncate",
        "write_mode"= "insert"
        )
)

itcfs$write_dataframe(DATA, nb_data_request = data_request)

Learn more

Parent topic: Using the Flight service in your applications