Using itc_utils with your own code

You can use the functions provided by itc_utils to extend code that uses the pyarrow library. Because itc_utils is based on pyarrow, you can pick specific functions of itc_utils and combine them with your pyarrow code.

Note:

The itc_utils library provides helper functions that you can use to make your programs easier to read. This library can be removed by IBM at any time, if deemed necessary, and functions can be changed without prior notice.

The following functions can simplify code development.

  • Functions for Flight descriptor creation:

    • get_data_request(): transforms a Python dictionary with named assets into the request required by Flight service
    • get_flight_cmd(): serializes a Python dictionary into a JSON string
    • get_flight_descriptor(): creates a ready to use pyarrow.flight.FlightDescriptor from a Python dictionary
  • Functions for client instantiation and authentication:

  • Functions for reading data

get_data_request()

The function get_data_request() transforms a Python dictionary with items connection_name, connected_data_name or data_name into a Python dictionary representing a Flight service request with items asset_id, and project_id or space_id.

The result can be used as input for other itc_utils library functions, for example get_flight_descriptor().

Examples:

  • With item connection_name:

    connection_data_request = {
        'connection_name': """MyConnection""",
        'interaction_properties': {
            'row_limit': 5000,
            'schema_name': '<schema>',
            'table_name': '<table>'
        }
    }
    flightRequest = itcfs.get_data_request(nb_data_request=connection_data_request)
    
  • With item connected_data_name:

    connected_asset_data_request = {
        'connected_data_name': """NameOfConnectedAsset""",
        'interaction_properties': {
            #'row_limit': 5000
        }
    }
    flightRequest = itcfs.get_data_request(nb_data_request=connected_asset_data_request)
    
  • With item data_name:

    CSV_data_request = {
        'data_name': """little.csv""",
        'interaction_properties': {
            #'row_limit': 500,
            'infer_schema': 'true',
            'infer_as_varchar': 'false'
            }
    }
    flightRequest = itcfs.get_data_request(nb_data_request=CSV_data_request)
    

get_flight_cmd()

The function get_flight_cmd() transforms a Python dictionary, representing a Flight service request into a JSON string. The function is helpful in Spark notebooks because the Spark Flight Connector expects a cmd. You can also use this function in your own pyarrow code to get a valid cmd for Flight service.

The result of the get_flight_cmd() function can serve as input to create of a pyarrow.flight.FlightDescriptor.

You can use this command with named parameters nb_data_request or data_request.

If you use nb_data_request, the dictionary items connection_name, connected_data_name or data_name are resolved to an asset_id, and project_id or space_id before the dictionary is transformed into a JSON string.

If you use data_request, the dictionary is transformed to a JSON string as is.

Examples:

  • Named parameter nb_data_request with dictionary item connection_name:

    MyConnection_data_request = {
        'connection_name': """MyConnection""",
        'interaction_properties': {
            'row_limit': 5000,
            'schema_name': '<schema>',
            'table_name': '<table>'
        }
    }
    cmd = itcfs.get_flight_cmd(nb_data_request=MyConnection_data_request)
    
  • Named parameter data_ request:

    My_flight_request = {
        'asset_id': '<asset_id>',
        'project_id': '<project_id>',
        'interaction_properties': {
            'row_limit': 5000,
            'schema_name': '<schema>',
            'table_name': '<table>'
        }
    }
    cmd = itcfs.get_flight_cmd(data_request=My_flight_request)
    

get_flight_descriptor()

The function get_flight_descriptor() transforms a Python dictionary, representing a Flight service request into a pyarrow.flight.FlightDescriptor.

As described in get_flight_cmd, you can use this command with the named parameters nb_data_request or data_request.

If you use nb_data_request, the dictionary items connection_name, connected_data_name or data_name are resolved to an asset_id, and project_id or space_id before the dictionary is transformed into a flight descriptor, provided the specified names are unique in the project or deployment space.

If you use data_request, the dictionary is transformed to a flight descriptor as is.

Example:

MyConnection_data_request = {
    'connection_name': """MyConnection""",
    'interaction_properties': {
        'row_limit': 5000,


       'schema_name': '<schema>',
        'table_name': '<table>'
    }
}

flightDescriptor = itcfs.get_flight_descriptor(nb_data_request=MyConnection_data_request, wslib=None)

get_flight_service_url()

The function get_flight_service_url() returns the configured endpoint of Flight service.

Example:

itcfs.get_flight_service_url()

get_flight_client()

The function get_flight_client() returns a ready-to-use flight client pointing to the configured or provided Flight service URL. This method also authenticates with Flight service by using the default or a provided user token.

Example:

flightClient = itcfs.get_flight_client()

You can use this command with named parameter url to specify an alternative Flight service endpoint, and with named parameter bearer_token to use an existing bearer token.

Example:

flightClient = itcfs.get_flight_client(url='<flight_service_url>', bearer_token='<bearer_token>')

TokenClientAuthHandler()

You can use an instance of TokenClientAuthHandler class to authenticate with a pyarrow.flight.FlightClient instance. Flight service expects a bearer token for authentication. To create an instance of the TokenClientAuthHandler, you can use an existing user token, or a bearer token.

  • Example of a default TokenClientAuthHandler without any parameters that uses the authentication token from the environment:

    authHandler = itcfs.TokenClientAuthHandler()
    
  • Example of a TokenClientAuthHandler that uses a token created by the authentication service:

    userToken = '<your_user_token>'
    authHandler = itcfs.TokenClientAuthHandler(token=userToken)
    
  • Example of a TokenClientAuthHandler that uses an existing bearer token:

    bearerToken = 'Bearer <your_bearer_token>'
    authHandler = itcfs.TokenClientAuthHandler(bearer_token=bearerToken)
    

get_flight_info()

The function get_flight_info()calls Flight service to receive a pyarrow.flight.FlightInfo object that contains the information required to read data.

get_flight_info() expects a positional parameter, an instance of a pyarrow.flight.FlightClient, and one of the named parameters nb_data_request or data_request. The difference between these parameters and the expected input is described in the sections about get_flight_cmd() and get_flight_descriptor().

Example:

My_flight_request = {
    'asset_id': '<asset_id>',
    'project_id': '<project_id>',
    'interaction_properties': {
        'row_limit': 5000,
        'schema_name': '<schema>',
        'table_name': '<table>'
    }
}
flightClient = itcfs.get_flight_client()
flightInfo = itcfs.get_flight_info(flightClient, data_request=My_flight_request)

read_pandas_and_concat()

The function read_pandas_and_concat()reads data from all endpoints in the flight info object and concatenates the data partitions into a single pandas.DataFrame.

read_pandas_and_concat() expects two positional parameters, namely an instance of a pyarrow.flight.FlightClient and a pyarrow.flight.FlightInfo object.

Example:

df = itcfs.read_pandas_and_concat(flightClient, flightInfo)

Learn more

Parent topic: Accessing data sources with Flight service