Direct data transfer

Direct data transfer enables a IBM® Spectrum Symphony client to maximize utilization of the network bandwidth between itself and the service. This feature essentially eliminates the session manager from the data flow allowing applications to optimize the use of different network topologies. This feature is supported on operating systems supported by IBM Spectrum Symphony.

Limitations

  • This feature cannot be used by offline clients since clients using the direct data transfer feature must always be available to provide data to the service instance while there is outstanding workload.
  • This feature cannot be used by recoverable clients. When the direct data transfer feature is enabled, data is actually cached in the running client instance instead of being sent to the session manager. There is no client-side recovery capability for this cached data.
  • This feature cannot be used with the Service Replay Debugger feature.
  • This feature cannot be used when the following session type attributes are set to the specified values, otherwise an exception is thrown.
    • abortSessionIfClientDisconnect="false"
    • discardResultsOnDelivery="false"

About direct data transfer

This section describes how data is submitted to a service using direct data transfer. However, it is helpful to first understand how data is submitted in IBM Spectrum Symphony's default model.

Default behavior without direct data transfer

Here is the sequence of events when a client wants to send task data to the service:

  1. Client sends a task input message containing the data to the API layer, which serializes the message and submits it to the underlying communication layer.
  2. The message is transferred by the communication layer to the session manager on the management host. The session manager replies to the client with an acknowledgement upon successful receipt of the message.
  3. The session manager routes the message to the service instance manager and service instance on the compute host.
  4. The service performs calculations on the input data within the message and returns the result to the client via the service instance manager and session manager.
The following diagram shows the data flow between the client and the service instance in IBM Spectrum Symphony's default model:
Data flow between the client and the service instance in the product default model

Behavior with direct data transfer

When direct data transfer is enabled for task input messages, the messages are sent to the service in the same manner as in IBM Spectrum Symphony's default model. The difference is when direct data transfer is enabled, the application data is not included in the task input message itself. Only metadata is actually sent with the task input message. To better understand the data flow between the client and the service when using direct data transfer for input and output messages, look at the sequence of events.
  1. The client formulates a task input message encoded with the URL of the client (metadata). This is the URL the client will listen on.
  2. The message is propagated to the service in the same manner as in IBM Spectrum Symphony's default model.
  3. The service driver extracts the client URL and uses it to retrieve the data from the client.
  4. The service performs calculations on the data and sends the resulting data directly to the client.
  5. The client waits for acknowledgment from the session manager about the success of the task before accessing the output data locally.
The following diagram shows the data flow between the client and the service instance with direct data transfer enabled for input and output messages:
Data flow between the client and the service instance with direct data transfer enabled for input and output messages

When to use direct data transfer

The direct transfer of application data should be considered in either of the following situations:
  • You have many client connections being routed through a session manager. The session manager's routing and scheduling overhead can potentially impede data flow to the service.
  • The client and service reside on the same subnet but the session manager does not.
Note: Direct data transfer can be used in conjunction with data compression or other features such as common data updates. For example, if direct data transfer and data compression are both enabled, the compressed data will be sent directly to the service.

Client API

The direct data transfer feature can only be enabled through the client API at the session level.

Enabling direct data transfer for sessions

You can enable direct data transfer for all tasks associated with a session, and optionally, for common data and common data updates. To enable direct data transfer, the client application must do the following:
  1. Create a session using the appropriate session attribute to inform the API of the client's intention to send data directly to the service. The session attribute is a member of the SessionCreationAttributes and SessonOpenAttributes classes.
  2. Send the task input messages to IBM Spectrum Symphony.
The following code sample shows how direct data transfer is enabled using the SessionCreationsAttribute class in each supported language.
C++
SesssionCreationAttributes attributes;
attributes.enableDirectDataTransfer(true);
Java™
SessionCreationAttributes attributes = new SessionCreationAttributes();
attributes.enableDirectDataTransfer(true);
C# (.NET)
SessionCreationAttributes attributes = new SessionCreationAttributes();
attributes.EnableDirectDataTransfer = true;

Setting direct data transfer flags

The direct data transfer flags allow greater control over IBM Spectrum Symphony behavior when the direct data transfer feature is enabled. By default, when direct data transfer is enabled, only the task data is sent directly between the client and service. This means that common data and common data updates are still sent to the service via the session manager. To override this behavior, it is necessary to set the appropriate direct data transfer flag.

For example, to set the direct data transfer flag for all tasks including common data and common data updates in a new session, the client application must do the following:
  1. Create a session using the appropriate session attribute to inform the API to include common data and common data updates in the direct data transfer. The session attribute is a member of the SessionCreationAttributes class.
  2. Send the task input messages to IBM Spectrum Symphony.
The following code sample shows how a direct data transfer flag is set with a SessionCreationAttributes object in each supported language.
C++
SessionCreationAttributes attributes;
attributes.setDirectDataTransferFlags(Session::IncludeCommonDataAndUpdates);
Java
SessionCreationAttributes attributes = new SessionCreationAttributes();
attributes.setDirectDataTransferFlags(DirectDataTransferFlags.linebreak pdfINCLUDE_COMMON_DATA_AND_UPDATES);
C# (.NET)
SessionCreationAttributes attributes = new SessionCreationAttributes();
attributes.DirectDataTransferFlags = DirectDataTransferFlags.linebreak pdfIncludeCommonDataAndUpdates;

Port configuration

You can define a port or port range for the client to listen for connections from the service. You may want to do this if your client is running behind a firewall. The SOAM_DIRECT_DATA_PORT environment variable is used to define the port or port range; for example, SOAM_DIRECT_DATA_PORT="25000" or SOAM_DIRECT_DATA_PORT="25000-25100".
Note: If the SOAM_DIRECT_DATA_PORT is not defined, IBM Spectrum Symphony will use the value defined in EGO_CLIENT_ADDR. If neither of these variables are defined, Symphony randomly selects a client port to listen on.

Client memory management

Since direct data transfer will most likely be used in situations where an application needs to transfer large amounts of data, the client's memory usage can become an issue. To conserve memory, IBM Spectrum Symphony can write the cached input and output data to disk and restore the data to memory only when it is required. Once the session is completed, IBM Spectrum Symphony removes all the input and output data from the file system.

For clients that have access to larger address space, for example, 64-bit clients, they have the ability to be optimized by keeping all the data in memory and relying on operating system paging instead of file caching.

The SOAM_DIRECT_DATA_STORAGE environment variable with possible values of StoreInMemory or StoreOnDisk is used to define whether the data resides in client memory or is written to disk. The default behavior when this variable is not defined is StoreOnDisk.

Client work data location management

On IBM Spectrum Symphony client or IBM Spectrum Symphony Developer Edition hosts, the default directory for storing work data for direct data transfer is in the $SOAM_HOME/work/datamanager directory (for Linux®) or %SOAM_HOME%\work\datamanager (for Windows ). However, you can change this default location by setting the first-level directory to store direct data transfer work data in the SOAM_DDT_WORK_DIR environment variable. For example, if you specify SOAM_DDT_WORK_DIR=/tmp, then IBM Spectrum Symphony uses the work/datamanager directory under /tmp directory, so that direct transfer data work data will be stored under the /tmp/work/datamanager directory.

Multiple network interfaces

IBM Spectrum Symphony allows a non-default interface on the client host to be specified for communication with the service instance. The SOAM_DIRECT_DATA_ADDRESS environment variable can be defined with a valid IP address (or host name alias) that represents the non-default interface.