Using itc_utils
with your own code
You can use the functions provided by itc_utils
to extend code that uses the pyarrow
library. Because itc_utils
is based on pyarrow
, you can pick specific functions of itc_utils
and combine them with your pyarrow
code.
The itc_utils
library provides helper functions that you can use to make your programs easier to read. This library can be removed by IBM at any time, if deemed necessary, and functions can be changed without prior notice.
The following functions can simplify code development.
-
Functions for Flight descriptor creation:
- get_data_request(): transforms a Python dictionary with named assets into the request required by Flight service
- get_flight_cmd(): serializes a Python dictionary into a JSON string
- get_flight_descriptor(): creates a ready to use pyarrow.flight.FlightDescriptor from a Python dictionary
-
Functions for client instantiation and authentication:
- get_flight_service_url(): gets the default Flight service URL
- get_flight_client(): creates a flight client and authenticates with Flight service using the current user token of the environment
- TokenClientAuthHandler(): authenticates to Flight service
-
Functions for reading data
- get_flight_info(): gets a
pyarrow.flight.FlightInfo
object for a Python dictionary - read_pandas_and_concat(): reads data from all endpoints and loads the data into a single
pandas.DataFrame
- get_flight_info(): gets a
get_data_request()
The function get_data_request()
transforms a Python dictionary with items connection_name
, connected_data_name
or data_name
into a Python dictionary representing a Flight service request with
items asset_id
, and project_id
or space_id
.
The result can be used as input for other itc_utils
library functions, for example get_flight_descriptor()
.
Examples:
-
With item
connection_name
:connection_data_request = { 'connection_name': """MyConnection""", 'interaction_properties': { 'row_limit': 5000, 'schema_name': '<schema>', 'table_name': '<table>' } } flightRequest = itcfs.get_data_request(nb_data_request=connection_data_request)
-
With item
connected_data_name
:connected_asset_data_request = { 'connected_data_name': """NameOfConnectedAsset""", 'interaction_properties': { #'row_limit': 5000 } } flightRequest = itcfs.get_data_request(nb_data_request=connected_asset_data_request)
-
With item
data_name
:CSV_data_request = { 'data_name': """little.csv""", 'interaction_properties': { #'row_limit': 500, 'infer_schema': 'true', 'infer_as_varchar': 'false' } } flightRequest = itcfs.get_data_request(nb_data_request=CSV_data_request)
get_flight_cmd()
The function get_flight_cmd()
transforms a Python dictionary, representing a Flight service request into a JSON string. The function is helpful in Spark notebooks because the Spark Flight Connector expects a cmd
. You
can also use this function in your own pyarrow
code to get a valid cmd
for Flight service.
The result of the get_flight_cmd()
function can serve as input to create of a pyarrow.flight.FlightDescriptor
.
You can use this command with named parameters nb_data_request
or data_request
.
If you use nb_data_request
, the dictionary items connection_name
, connected_data_name
or data_name
are resolved to an asset_id
, and project_id
or space_id
before the dictionary is transformed into a JSON string.
If you use data_request
, the dictionary is transformed to a JSON string as is.
Examples:
-
Named parameter
nb_data_request
with dictionary itemconnection_name
:MyConnection_data_request = { 'connection_name': """MyConnection""", 'interaction_properties': { 'row_limit': 5000, 'schema_name': '<schema>', 'table_name': '<table>' } } cmd = itcfs.get_flight_cmd(nb_data_request=MyConnection_data_request)
-
Named parameter
data_ request
:My_flight_request = { 'asset_id': '<asset_id>', 'project_id': '<project_id>', 'interaction_properties': { 'row_limit': 5000, 'schema_name': '<schema>', 'table_name': '<table>' } } cmd = itcfs.get_flight_cmd(data_request=My_flight_request)
get_flight_descriptor()
The function get_flight_descriptor()
transforms a Python dictionary, representing a Flight service request into a pyarrow.flight.FlightDescriptor
.
As described in get_flight_cmd
, you can use this command with the named parameters nb_data_request
or data_request
.
If you use nb_data_request
, the dictionary items connection_name
, connected_data_name
or data_name
are resolved to an asset_id
, and project_id
or space_id
before the dictionary is transformed into a flight descriptor, provided the specified names are unique in the project or deployment space.
If you use data_request
, the dictionary is transformed to a flight descriptor as is.
Example:
MyConnection_data_request = {
'connection_name': """MyConnection""",
'interaction_properties': {
'row_limit': 5000,
'schema_name': '<schema>',
'table_name': '<table>'
}
}
flightDescriptor = itcfs.get_flight_descriptor(nb_data_request=MyConnection_data_request, wslib=None)
get_flight_service_url()
The function get_flight_service_url()
returns the configured endpoint of Flight service.
Example:
itcfs.get_flight_service_url()
get_flight_client()
The function get_flight_client()
returns a ready-to-use flight client pointing to the configured or provided Flight service URL. This method also authenticates with Flight service by using the default or a provided user token.
Example:
flightClient = itcfs.get_flight_client()
You can use this command with named parameter url
to specify an alternative Flight service endpoint, and with named parameter bearer_token
to use an existing bearer token.
Example:
flightClient = itcfs.get_flight_client(url='<flight_service_url>', bearer_token='<bearer_token>')
TokenClientAuthHandler()
You can use an instance of TokenClientAuthHandler
class to authenticate with a pyarrow.flight.FlightClient
instance. Flight service expects a bearer token for authentication. To create an instance of the TokenClientAuthHandler,
you can use an existing user token, or a bearer token.
-
Example of a default
TokenClientAuthHandler
without any parameters that uses the authentication token from the environment:authHandler = itcfs.TokenClientAuthHandler()
-
Example of a
TokenClientAuthHandler
that uses a token created by the authentication service:userToken = '<your_user_token>' authHandler = itcfs.TokenClientAuthHandler(token=userToken)
-
Example of a
TokenClientAuthHandler
that uses an existing bearer token:bearerToken = 'Bearer <your_bearer_token>' authHandler = itcfs.TokenClientAuthHandler(bearer_token=bearerToken)
get_flight_info()
The function get_flight_info()
calls Flight service to receive a pyarrow.flight.FlightInfo
object that contains the information required to read data.
get_flight_info()
expects a positional parameter, an instance of a pyarrow.flight.FlightClient
, and one of the named parameters nb_data_request
or data_request
. The difference between these
parameters and the expected input is described in the sections about get_flight_cmd() and get_flight_descriptor().
Example:
My_flight_request = {
'asset_id': '<asset_id>',
'project_id': '<project_id>',
'interaction_properties': {
'row_limit': 5000,
'schema_name': '<schema>',
'table_name': '<table>'
}
}
flightClient = itcfs.get_flight_client()
flightInfo = itcfs.get_flight_info(flightClient, data_request=My_flight_request)
read_pandas_and_concat()
The function read_pandas_and_concat()
reads data from all endpoints in the flight info object and concatenates the data partitions into a single pandas.DataFrame
.
read_pandas_and_concat()
expects two positional parameters, namely an instance of a pyarrow.flight.FlightClient
and a pyarrow.flight.FlightInfo object
.
Example:
df = itcfs.read_pandas_and_concat(flightClient, flightInfo)
Learn more
Parent topic: Accessing data sources with Flight service