Flight client example for accessing a data product
After you have subscribed to a data product using the Flight service as the delivery method for one or more items, you can programmatically access the data using an Arrow client. Arrow libraries are available for C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R and Ruby. See Apache Arrow for instructions on installing the libraries for each language. This topic provides a Python example for accessing the data in a data product from an Arrow client.
When you subscribe to a data product and select delivery by the Flight service, you receive a Flight URL and Descriptor that are used to access the data product. You can download the Flight URL and Descriptor from the subscription tile that is located under My subscriptions.
The Flight URL points to the external route of the Flight service for your environment. The Descriptor contains an asset ID and a catalog ID used to connect to a data source to deliver the items in a data product.
- Accessing a data product with a Python Flight client
- Accessing a data product with a Python Flight client and watsonx.ai
Example for accessing a data product with a Python Flight client
Follow these steps to access a data product with a Flight client in Python:
-
Import the required libraries.
Import the Flight Python libraries together with the request and json libraries which are used to make REST API requests.
import pyarrow from pyarrow import flight import requests import json
-
Define an authentication handler.
class TokenClientAuthHandler(flight.ClientAuthHandler): def __init__(self, token): super().__init__() strToken = str(token) self.token = strToken.encode('utf-8') def authenticate(self, outgoing, incoming): outgoing.write(self.token) self.token = incoming.read() def get_token(self): return self.token
-
Authenticate with Data Product Hub by using the REST API.
The following example authenticates with Data Product Hub using the authentication API. See Authentication.
readClient = flight.FlightClient( FLIGHT_URL, override_hostname=FLIGHT_HOSTNAME, disable_server_verification=True) response = requests.post('https://' + CPD_CLUSTER_HOSTNAME + '/icp4d-api/v1/authorize', json={"username": USERNAME, "password": PASSWORD}, verify=False).json() token = 'Bearer ' + response['token'] readClient.authenticate(TokenClientAuthHandler(token), options=flight.FlightCallOptions(timeout=5.0))
The variables are defined as follows:
- FLIGHT_URL - The Flight URL that is shown in the subscription tile for the data product
- FLIGHT_HOSTNAME - The hostname from the Flight URL
- CPD_CLUSTER_HOSTNAME - The hostname for your Cloud Pak for Data cluster
- USERNAME and PASSWORD - Your login credentials on Data Product Hub
Following is a Python example showing sample values for the variables:
FLIGHT_URL='grpc+tls://fc-route-wkc.apps.mycluster.myorg.com:443'
FLIGHT_HOSTNAME='fc-route-wkc.apps.mycluster.myorg.com'
CPD_CLUSTER_HOSTNAME='cpd-wkc.apps.mycluster.myorg.com'
USERNAME='my_username'
PASSWORD='my_password'
NOTE: If your data product subscription does not contain the Flight URL, refer to this possible solution: Flight service URL fails to generate for a subscription.
-
Initialize the Flight client.
flightDescriptor = flight.FlightDescriptor.for_command(json.dumps(DESCRIPTOR)) flightInfo = readClient.get_flight_info(flightDescriptor)
The variable is defined as follows:
DESCRIPTOR is the Flight descriptor in Python copied from the subscription tile for the data product.
-
Read the data from the table and load into Pandas.
for endpoint in flightInfo.endpoints: reader = readClient.do_get(endpoint.ticket) table = reader.read_all()
Your data is stored in the table variable and you can now work with the data in your application. For example, you can load the table into a Pandas dataframe using
table.to_pandas()
to work with the data in Pandas.table.to_pandas()
Example for accessing a data product with a Python Flight client and watsonx.ai
If your environment includes watsonx.ai Studio, which includes tools for working with Jupyter notebooks, then you can use a flight delivery result directly in a notebook without exposing an external flight route.
You can use this example when your notebook is in the same Cloud Pak for Data cluster as the Data Product Hub instance.
-
Import the required libraries.
Import the Flight Python libraries together with the request and json libraries which are used to make REST API requests.
import pyarrow
from pyarrow import flight
import requests
import json
- Import the itc_utils helper library to improve code readability.
import itc_utils.flight_service as itcfs
flight_client = itcfs.get_flight_client()
- Initialize the Flight client.
descriptor = {
"asset_id": "b571ebf2-XXXX-40c0-afc8-9385b099cf3f",
"catalog_id": "4d9148dc-XXXX-46cf-9db0-e62946d1413e"
}
flightDescriptor = flight.FlightDescriptor.for_command(json.dumps(descriptor))
flightInfo = flight_client.get_flight_info(flightDescriptor)
- Read the data from the table.
for endpoint in flightInfo.endpoints:
reader = flight_client.do_get(endpoint.ticket)
table = reader.read_all()
- Load the data into Pandas.
Your data is stored in the table variable and you can now work with the data in your application. For example, you can load the table into a Pandas dataframe.
table.to_pandas()
Parent topic: Getting a data product