Connecting to Flight servicee manually in R

The open source R arrow library has no native R support for the Apache Arrow Flight protocol. The recommended method is to use the R reticulate library and the open source pyarrow Python library to invoke Flight service from R environments. Although functions for calling Flight servers do exist in the R arrow library, they also use reticulate, but aren't very well documented, nor as flexible with regards to parameters and authorization. To understand how to call Flight service in notebooks, see Using Flight service in Python notebooks.

If you use the code that is generated for you to load data from a file or a connection from the Code snippets pane in a notebook, the code also uses reticulate and pyarrow to invoke Flight service, as well as the reticulated itc_utils Python library.

If you want to write your own code for Flight service, the best practice with notebooks is to start with the code that is generated for you, and then modify this code if necessary by leveraging reticulate, pyarrow and itc_utils. You need to write your own code in the following situations:

  • The generated code requires changes, for example to use in a production environment
  • The functionality to add generated code is not available for the asset
  • The tool doesn't support adding generated code, for example RStudio

Basic Flight service interaction

For the purpose of illustrating the basic steps and for simplicity, the code samples in this section leverage the Python itc_utils library provided by IBM, which provides functions to interact with Flight service.

Reading data

The steps to read data from a data source include:

  1. Creating a flight descriptor with the metadata to access a data source
  2. Creating an instance of a flight client
  3. Sending the flight descriptor to Flight service to obtain a flight info object
  4. Reading data from a data source. The code snippet shows reading to a pyarrow.Table which is then converted to a data frame.

Sample code snippet to read data:

# use reticulate library to invoke Python code
library(reticulate)

# use open source R arrow library
library(arrow)

# load Python utility library provided by IBM with reticulate
itcfs <- import("itc_utils.flight_service")

# create a flight descriptor from a data request
data_request = dict(
    "asset_id" = "<asset_id>",
    "project_id" = "<project_id>",
    "interaction_properties" = dict(
        "row_limit" = 5000,
        "schema_name" = "<schema>",
        "table_name" = "<table>"
    )
)
flightDescriptor = itcfs$get_flight_descriptor(data_request=data_request)

# create an instance of a flight client
flightClient <- itcfs$get_flight_client()

# obtain a FlightInfo object which provides information for reading data from one or more endpoints.
flightInfo <- itcfs$get_flight_info(flightClient, flight_descriptor=flightDescriptor)

# read data into a vector of pyarrow.Table objects
tables <- itcfs$read_tables(flightClient, flightInfo, timeout=240)

# convert a pyarrow.Table to an R dataframe
df <- as.data.frame(tables[[1]])

Writing data

The steps to write data to a data source include:

  1. Creating a flight descriptor with the metadata to access a data source
  2. Creating an instance of a flight client
  3. Authenticating with Flight service
  4. Obtaining a flight write stream
  5. Writing data to a data target

Sample code snippet to write data:

# use reticulate library to invoke Python code
library("reticulate")

# use open source R arrow library
library("arrow")

# load Python utility library provided by IBM with reticulate
itcfs <- import("itc_utils.flight_service")

# create a flight descriptor
nb_data_request = dict (
    "asset_id" = "<asset_id>",
    "project_id" = "<project_id>",
    "interaction_properties" = dict (
        "schema_name" = "<schema>",
        "table_name" = "<table>",
        "existing_table_action" = 'truncate',
    )
)
flightDescriptor = itcfs$get_flight_descriptor(data_request=data_request)

# create an instance of a flight client
flightClient <- itcfs$get_flight_client()

# write a data to a data target
itcfs$write_table(as_arrow_table(data_df_2),flightDescriptor,flightClient)

Flight descriptor

A central piece of interaction with Flight service is the flight descriptor which specifies the access to the data source. It includes a data source specification in the form of connection properties, such as host, port, and so on, and interaction properties, such as a table name, or SQL statement. Instead of connection properties, you can also specify the IDs of the asset and of the project or deployment space.

Technically, Flight service expects a flight descriptor in the form of a JSON string. The best practice in R notebooks is to use reticulate to create a Python dictionary with the request properties and the itc_utils function get_flight_descriptor to create a flight descriptor:

data_request = dict(
    "asset_id" = "<asset_id>",
    "project_id" = "<project_id>",
    "interaction_properties" = dict(
        "row_limit" = 5000,
        "schema_name" = "",
        "table_name" = ""
        )
    )
flightDescriptor = itcfs$get_flight_descriptor(data_request=data_request)

Flight descriptors for reading or writing data only differ slightly with regards to the interaction_properties. Read requests typically have interaction properties like 'sql_statement', 'file_name', or 'table_name', while write requests have additional interaction properties, such as 'existing_table_action' or 'file_format'.

For details about the data request syntax, see Flight data requests.

Authenticating with the Flight Server

You need to authenticate with Flight service with a valid bearer token. For that purpose, you can use the following code snippet to write a custom authentication handler class and create an instance of this class.

Because in R notebooks, Flight service is accessed by reticulated pyarrow functions written in Python, the code sample shows a possible, yet working approach to create a custom authentication handler. A best practice in R notebooks is to use the reticulated itc_utils functions TokenClientAuthHandler() or better get_flight_client().

library("reticulate")
flight <- import("pyarrow.flight")

TokenClientAuthHandler <- PyClass(
    "TokenClientAuthHandler",
    list(
        bearer_token = NULL,
        outgoing = NULL,
        incoming = NULL,

        `__init__` = function(self, bearer_token){
            super()$`__init__`()
            self$token <- charToRaw(bearer_token)
            NULL
        },

        authenticate = function(self, outgoing, incoming){
            outgoing$write(self$token)
            self$token <- incoming$read()
        },

        get_token = function(self){
            return(self$token)
        }
    ),
    inherit = flight$ClientAuthHandler
)

library(ibmWatsonStudioLib)
wslib <- access_project_or_space()
token <- paste("Bearer ", wslib$auth$get_current_token(), sep = "")

authHandler <- TokenClientAuthHandler(token)

Learn more

Parent topic: Flight service in R notebooks