Examples of Flight service data requests with Python

You can use Flight service and the Apache Arrow Flight protocol to read and write data using Python in a project or space.

To read or write data using Python, you can:

  • Use code that is generated for you to load data from a selected project asset in a Python notebook. This generated code uses the itc_utils library, which wraps calls to the open source Python pyarrow library, improving code readablity while reducing code size.

  • Write your own code to read and write data in a project or space using the open source Python pyarrow library to invoke Flight service. You need to write your own code in the following situations:

    • The generated code requires changes, for example to use in a production environment
    • The functionality to add generated code is not available for the asset
    • The tool doesn't support adding generated code

The following sections provide more details about how you can specify the data source and interactive properties for read requests using Flight data requests.

For details on the data request syntax and properties, see Flight data requests.

To access data, you need to create a data request object using a JSON document containing the description of the asset you want to read or write. In Python, you can use dictionaries to compose these JSON documents.

You can use the following data request examples to learn how to specify the data source and interactive properties for data read or write requests in Python. The properties vary depending on the data source.

Example of a schema and table request

data_request = {
    "asset_id": "ASSET_ID", # asset_id must point to a connection asset
    "project_id|space_id|catalog_id": "ID",
    "num_partitions": NUM, # Optional
    "batch_size": SIZE, # Optional
    "interaction_properties": {
        "schema_name": "schema",
        "table_name": "table"
    },
    "fields": "fields" # Optional
}

Example of an SQL request

data_request = {
    "asset_id": "ASSET_ID", # asset_id must point to a connection asset
    "project_id|space_id|catalog_id": "ID",
    "num_partitions": NUM, # Optional
    "batch_size": SIZE, # Optional
    "interaction_properties": {
        "select_statement": "SELECT ... FROM <schema>.<table> WHERE ..."
    }
}

Example of a request to read from a data asset file like a Parquet file

data_request = {
    "asset_id": "ASSET_ID", # asset_id must point to a connected data asset
    "project_id|space_id|catalog_id": "ID",
    "num_partitions": NUM, # Optional
    "fields": "fields" # Optional
}

Example of read request from a Parquet file

data_request = {
    "asset_id": "ASSET_ID",
    "project_id|space_id|catalog_id": "ID",
    "num_partitions": NUM, # Optional
    "batch_size": SIZE, # Optional
    "interaction_properties": {
        "folder": "FOLDER",
        "file": "FILE",
        "bucket": "BUCKET", # If using COS specify bucket in addition to folder/file
        "file_format": "parquet"
    },
    "fields": "fields" # Optional
}

Example of a request to a connection passing connection properties

data_request = {
    "datasource_type": "dashdb",
    "connection_properties": {
        "database": "DBNAME",
        "password": "PASSWORD",
        "port": NUM,
        "host": "HOSTNAME",
        "ssl": true,
        "username": "CONNUSER"
    },
    "num_partitions": NUM, # Optional
    "batch_size": SIZE, # Optional
    "interaction_properties": {
        "schema_name": "schema",
        "table_name": "table",
    }
    "fields": "fields" # Optional
}

The "connection_properties" and "interaction_properties"properties differ depending on the connector. See Data and AI Common Core API Datasource types for details.

Code samples using Flight data requests

The following code samples show how Flight service can be invoked within Python notebooks. The samples illustrate the differences between data requests depending on the data source connection.

The samples that show how to invoke Flight service use a Python library called itc_utils, which is provided by IBM for use in notebooks. For more details, see Flight service in Python notebooks.

Example of the data request returned in the code that is generated to load data for an HTTP connection

import itc_utils.flight_service as itcfs
readClient = itcfs.get_flight_client()
HTTP_data_request = {
    'connection_name': """HTTP""",
    'interaction_properties': {
        'infer_schema': 'true'
    }
}
flightInfo = itcfs.get_flight_info(readClient, nb_data_request=HTTP_data_request)
data_df_1 = itcfs.read_pandas_and_concat(readClient, flightInfo)
data_df_1.head(10)

Example of a data request using connection IDs and no properties

import itc_utils.flight_service as itcfs
readClient = itcfs.get_flight_client()
data_request = {
    'project_id': 'c01a0212-47cc-4cd4-8cc8-1dd92dbb4bde',
    'asset_id': 'a6c0117c-e6da-4d71-9da4-659edcec7ba5',
    'interaction_properties': {
        'infer_schema': 'true'
    }
}
flightInfo = itcfs.get_flight_info(readClient, data_request=data_request)
data_df_1 = itcfs.read_pandas_and_concat(readClient, flightInfo)
data_df_1.head(10)

Example of a data request for a Db2 connection using properties

import itc_utils.flight_service as itcfs
readClient = itcfs.get_flight_client()
data_request = {'datasource_type': {'entity': {'name': 'db2'}},
    'connection_properties': {'database': 'BLUDB',
    'password': '****',
    'username_password_security': 'clear_text',
    'port': '50001',
    'host': 'dashdb-123.services.dal.bluemix.net',
    'inherit_access_token': 'false',
    'username_password_encryption': 'default',
    'ssl': 'true',
    'username': 'myuser'},
    'interaction_properties': {'schema_name': 'MYSCHEMA', 'table_name': 'A_TABLE'}
}
flightInfo = itcfs.get_flight_info(readClient, data_request=data_request)
data_df_1 = itcfs.read_pandas_and_concat(readClient, flightInfo)
data_df_1.head(10)

Learn more

Parent topic: Flight service in Python notebooks