Examples of Flight service data requests with Python
You can use Flight service and the Apache Arrow Flight protocol to read and write data using Python in a project or space.
To read or write data using Python, you can:
-
Use code that is generated for you to load data from a selected project asset in a Python notebook. This generated code uses the
itc_utils
library, which wraps calls to the open source Pythonpyarrow
library, improving code readablity while reducing code size. -
Write your own code to read and write data in a project or space using the open source Python
pyarrow
library to invoke Flight service. You need to write your own code in the following situations:- The generated code requires changes, for example to use in a production environment
- The functionality to add generated code is not available for the asset
- The tool doesn't support adding generated code
The following sections provide more details about how you can specify the data source and interactive properties for read requests using Flight data requests.
For details on the data request syntax and properties, see Flight data requests.
To access data, you need to create a data request object using a JSON document containing the description of the asset you want to read or write. In Python, you can use dictionaries to compose these JSON documents.
You can use the following data request examples to learn how to specify the data source and interactive properties for data read or write requests in Python. The properties vary depending on the data source.
Example of a schema and table request
data_request = {
"asset_id": "ASSET_ID", # asset_id must point to a connection asset
"project_id|space_id|catalog_id": "ID",
"num_partitions": NUM, # Optional
"batch_size": SIZE, # Optional
"interaction_properties": {
"schema_name": "schema",
"table_name": "table"
},
"fields": "fields" # Optional
}
Example of an SQL request
data_request = {
"asset_id": "ASSET_ID", # asset_id must point to a connection asset
"project_id|space_id|catalog_id": "ID",
"num_partitions": NUM, # Optional
"batch_size": SIZE, # Optional
"interaction_properties": {
"select_statement": "SELECT ... FROM <schema>.<table> WHERE ..."
}
}
Example of a request to read from a data asset file like a Parquet file
data_request = {
"asset_id": "ASSET_ID", # asset_id must point to a connected data asset
"project_id|space_id|catalog_id": "ID",
"num_partitions": NUM, # Optional
"fields": "fields" # Optional
}
Example of read request from a Parquet file
data_request = {
"asset_id": "ASSET_ID",
"project_id|space_id|catalog_id": "ID",
"num_partitions": NUM, # Optional
"batch_size": SIZE, # Optional
"interaction_properties": {
"folder": "FOLDER",
"file": "FILE",
"bucket": "BUCKET", # If using COS specify bucket in addition to folder/file
"file_format": "parquet"
},
"fields": "fields" # Optional
}
Example of a request to a connection passing connection properties
data_request = {
"datasource_type": "dashdb",
"connection_properties": {
"database": "DBNAME",
"password": "PASSWORD",
"port": NUM,
"host": "HOSTNAME",
"ssl": true,
"username": "CONNUSER"
},
"num_partitions": NUM, # Optional
"batch_size": SIZE, # Optional
"interaction_properties": {
"schema_name": "schema",
"table_name": "table",
}
"fields": "fields" # Optional
}
The "connection_properties"
and "interaction_properties"
properties differ depending on the connector. See Data and AI Common Core API Datasource types for details.
Code samples using Flight data requests
The following code samples show how Flight service can be invoked within Python notebooks. The samples illustrate the differences between data requests depending on the data source connection.
The samples that show how to invoke Flight service use a Python library called itc_utils
, which is provided by IBM for use in notebooks. For more details, see Flight service in Python notebooks.
Example of the data request returned in the code that is generated to load data for an HTTP connection
import itc_utils.flight_service as itcfs
readClient = itcfs.get_flight_client()
HTTP_data_request = {
'connection_name': """HTTP""",
'interaction_properties': {
'infer_schema': 'true'
}
}
flightInfo = itcfs.get_flight_info(readClient, nb_data_request=HTTP_data_request)
data_df_1 = itcfs.read_pandas_and_concat(readClient, flightInfo)
data_df_1.head(10)
Example of a data request using connection IDs and no properties
import itc_utils.flight_service as itcfs
readClient = itcfs.get_flight_client()
data_request = {
'project_id': 'c01a0212-47cc-4cd4-8cc8-1dd92dbb4bde',
'asset_id': 'a6c0117c-e6da-4d71-9da4-659edcec7ba5',
'interaction_properties': {
'infer_schema': 'true'
}
}
flightInfo = itcfs.get_flight_info(readClient, data_request=data_request)
data_df_1 = itcfs.read_pandas_and_concat(readClient, flightInfo)
data_df_1.head(10)
Example of a data request for a Db2 connection using properties
import itc_utils.flight_service as itcfs
readClient = itcfs.get_flight_client()
data_request = {'datasource_type': {'entity': {'name': 'db2'}},
'connection_properties': {'database': 'BLUDB',
'password': '****',
'username_password_security': 'clear_text',
'port': '50001',
'host': 'dashdb-123.services.dal.bluemix.net',
'inherit_access_token': 'false',
'username_password_encryption': 'default',
'ssl': 'true',
'username': 'myuser'},
'interaction_properties': {'schema_name': 'MYSCHEMA', 'table_name': 'A_TABLE'}
}
flightInfo = itcfs.get_flight_info(readClient, data_request=data_request)
data_df_1 = itcfs.read_pandas_and_concat(readClient, flightInfo)
data_df_1.head(10)
Learn more
Parent topic: Flight service in Python notebooks