Concurrently running tasks in a single service to share data

Running multiple tasks concurrently in one service instance permits you to easily share data among tasks in the same session or application, and save memory. An application that is configured to run multiple tasks concurrently on a service instance is called a multiple task service (MTS) application. This feature is supported on all operating systems supported by IBM® Spectrum Symphony Advanced Edition. It is not supported on for MapReduce jobs.

Why use MTS?

If you run one task per service instance, it is not convenient for tasks to share data. When tasks run in an MTS application, they share the same process space, which also conserves memory. This differs from an application that uses common data where each task runs on a service instance that still needs its own copy of the common data.

Example:
  • Common data is 4 GB
  • Compute host has 8 GB and eight cores

The client creates a session with a default service-to-slot-ratio and submits eight long-running tasks. Each task has a 250 MB data size.

Without MTS, if you run the eight tasks on the host (one task per core), the memory usage is: (8 * common data size (4 GB)) + (8 * task data size(250 MB)) = 34 GB

The memory requirements are more than four times the amount of physical memory available on the host, which can cause performance issues as the OS constantly swaps memory. Run fewer tasks on the host to avoid this situation.

With MTS, if you run the eight tasks on the host (one task per core), the memory usage is: (1 * common data size (4 GB)) + (8 * task data size(250 MB)) = 6 GB

All eight tasks can run on this host without any performance impact.

When tasks are not submitted continuously, configuring MTS does not provide any significant performance benefits. In this scenario, the MTS is restarted multiple times because often not many tasks exist and the SIM must be released from the current session. To avoid the MTS from being restarted when tasks are not submitted continuously, configure minimum services scheduling in the application profile.

MTS behavior

With MTS, there is only one service instance handling requests for each session or application on each host, depending on the MTS configuration. For example, if you configure an MTS to be associated with a single session, there will be only one MTS dedicated to each session on each host. If you configure an MTS to be associated with a single application, there will be only one MTS to handle workload for the different sessions of the application.

Based on the number of slots allocated to the service, multiple tasks can run concurrently in the MTS instance. Threads are created in the service instance to handle the tasks from the SIM, one thread per task.

One MTS per session

If you configure one MTS per session, initially, only one MTS is started on a host and the SIM on this host connects to the MTS. When workload for the first session comes in, the MTS is associated with that session. When workload for another session comes in, the SIM creates a new MTS. On each host, there is only one service instance handling requests for each session.

Note that on each host, there might be multiple MTS instances for an application. Each one handles the workload for a different session and can process multiple concurrent tasks, depending on the number of slots allocated to it.

One MTS per application

If you configure one MTS per application, there is only one MTS handling requests for a particular application and service type on each host. There may be multiple MTS instances on each host, but each one will handle workload for a different application and service type:
MTS session architecture
The MTS allows one SIM on the host for the same application and service type to connect to it. Tasks from the same session or different sessions execute concurrently in the MTS:
Tasks from the same session or different sessions execute concurrently in the MTS

Resource sharing scenario

Here is a scenario to help illustrate MTS behavior when a single MTS is configured for each session and two sessions share the same resource.
  • Client creates one session and submits 10 tasks at T0
  • Client creates another session and submits 10 tasks at T1
  1. When the tasks are submitted at T0, the SIM creates an MTS.
  2. When the tasks are submitted from the second session at T1, the second session will be entitled to half of the resources (assuming a proportional policy with equal proportions).
  3. After the next task completes for the first session, the SSM will re-assign the resource to the second session.
  4. A new MTS, "MTS 2", will be created and the SIM will connect to "MTS 2". Meanwhile, the previous connections will be closed.
  5. Steps 3 and 4 are repeated for the next task that finishes for session 1.

Common data updates

In the compute host, the SIM can send common data updates for the session to the MTS. Since common data updates may happen while some tasks are running, it is up to the application to ensure proper synchronization.

Terminating and suspending tasks

If a task is terminated or suspended, the SIM waits for the taskGracePeriod or suspendGracePeriod, respectively, or the effective reclaim grace period (for reclaim) to expire. When a task is terminated or suspended, or the resource associated with the task is reclaimed, the onServiceInterrupt(InterruptEventPtr& event) method is executed on the service side to inform the service about how much time it has to clean up. The InterruptEvent contains the information about the interrupted session or task; refer to the API reference documentation for more information.

If the interrupted task does not return from the onInvoke() method before the grace period expires, the MTS is restarted and the other tasks in the MTS are re-run without incrementing the retry count.

For suspend/reclaim, the interrupted task is re-queued and later re-run without incrementing the retry count.

For task/session termination, the interrupted task is terminated.

Service error handling in MTS mode

Actions that can be taken on an MTS

In MTS mode, multiple tasks run in the same process concurrently. Any interruption to one task may affect other tasks running in the same process. Service error handling enables you to configure timeouts for all methods within the service and actions to take when a timeout occurs. The timeouts and actions are configured using the duration and actionOnSI parameters in the application profile for the methods within the service. The following paragraphs describe service error handling behavior when actions are taken on an MTS instance.

restartService

If an MTS is alive when the SIM needs to restart it, one of the threads in the MTS may request that the service instance be restarted via customized application error handling. For example, onInvoke() throws a FailureException and the configured behavior in the application profile is restartService. In this case, the MTS will be restarted and the other running tasks will be re-run on the new instance without incrementing the task retry counter.

If an MTS exits, the connected SIM will detect the exit almost simultaneously. The SIM enforces the configured error handling behavior for the stage in the service lifecycle that it is executing. For example, if the SIM is executing an onInvoke() and the MTS exits at this time, the SIM will enforce the error handling behavior for Invoke.

If the SIM does not get the response of a command (such as Invoke or sessionEnter) from the MTS within a timeout period and the actionOnSI for the timeout error handling is set to restartService, the SIM follows the same behavior as when the MTS is alive and the SIM needs to restart, as described previously in this section.

blockHost

This action blocks the compute host for the application. Once a SIM informs the SSM to block the compute host, the other SIMs will keep running tasks in the MTS until the other slots on the host are released. This action immediately terminates the MTS process and blocks all slots for that MTS. The host is added to the block list for the SSM allocation. Workload that is impacted is retried without penalty. All resource units that the MTS was using are released.

keepAlive

This action keeps the MTS process alive.

Default application error handling for MTS

The default error handling configuration for MTS is the same as non-MTS with one exception for the setting of actionOnSI of the SessionUpdate method:
<Method name="SessionUpdate">
             <Exception type="failure" actionOnSI="restartService"  actionOnWorkload="retry"/> 
             <Exception type="fatal"   actionOnSI="restartService" actionOnWorkload="fail"/> 
     </Method> 

Handling behavior of an MTS when a SIM exits

If a SIM exits while its task is still running, the MTS will be restarted and the other running tasks will be re-run on the new instance. Regardless of whether the task of a SIM that has exited is running or not, the task retry counter of affected tasks is not increased.

Supported error handling configurations

MTS only supports a subset of the error handling configurations that are available in non-MTS mode. The following table shows the supported configurations:
Table 1. Supported MTS configurations
Method actOnWK actOnSI Failure Exception Fatal Exception Timeout Exit Return
Register Not applicable blockHost No No Yes Yes No
Not applicable restartService No No Yes Yes No
CreateService Not applicable keepAlive No No No No Yes
Not applicable blockHost Yes Yes Yes Yes Yes
Not applicable restartService Yes Yes Yes Yes Yes
SessionEnter succeed keepAlive No No No No Yes
succeed blockHost No No No No Yes
succeed restartService No No No No Yes
retry keepAlive Yes Yes No No Yes
retry blockHost Yes Yes Yes Yes Yes
retry restartService Yes Yes Yes Yes Yes
fail keepAlive Yes Yes No No Yes
fail blockHost Yes Yes Yes Yes Yes
fail restartService Yes Yes Yes Yes Yes
SessionUpdate succeed keepAlive No No No No Yes
succeed blockHost No No No No Yes
succeed restartService No No No No Yes
retry blockHost Yes Yes Yes Yes Yes
retry restartService Yes Yes Yes Yes Yes
fail blockHost Yes Yes Yes Yes Yes
fail restartService Yes Yes Yes Yes Yes
Invoke succeed keepAlive No No No No Yes
succeed blockHost No No No No Yes
succeed restartService No No No No Yes
retry keepAlive Yes Yes No No Yes
retry blockHost Yes Yes Yes Yes Yes
retry restartService Yes Yes Yes Yes Yes
fail keepAlive Yes Yes No No Yes
fail blockHost Yes Yes Yes Yes Yes
fail restartService Yes Yes Yes Yes Yes
SessionLeave Not applicable keepAlive Yes Yes No No Yes
  Not applicable blockHost Yes Yes Yes Yes Yes
  Not applicable restartService Yes Yes Yes Yes Yes
DestroyService Not applicable Not applicable No No Yes No No

Service API

Concurrent execution and application synchronization

Concurrent execution inside the service instance adheres to the following principles. It is up to the application to ensure proper synchronization.

  1. onCreateService() will only be called at the beginning of the process lifetime. The middleware will not execute any other handler while onCreateService() is executing.
  2. onDestroyService() will only be called at the end of the process lifetime. The middleware will not execute any other handler while onDestroyService() is executing.
  3. For sessions without common data, there are no handlers called to indicate when the session is assigned and unassigned from the service. For a session without common data, onInvoke() invocations may execute concurrently any time after the onCreateService() completes and before onDestroyService() is called.
  4. For sessions with common data, onSessionEnter() and onSessionLeave() scope the period of time that the service instance is assigned to a particular session.

    When an MTS belongs to a single application, it may be assigned and unassigned from a session more than once. MTS may be assigned to multiple sessions at once. The remaining handlers (onSessionEnter(), onSessionUpdate(), onSessionLeave(), and onInvoke()) may execute concurrently within the process under the following rules:

    1. onSessionEnter() invocations for a session will not execute concurrently with other handlers (onSessionEnter(), onSessionUpdate(), onSessionLeave(), and onInvoke()) for that session.
    2. onSessionLeave() invocations for a session will not execute concurrently with other handlers (onSessionEnter(), onSessionUpdate(), onSessionLeave(), and onInvoke()) for that session.
    3. onInvoke() invocations for a session may execute concurrently.
    4. onSessionUpdate() invocations will occur serially for a session.
    5. onInvoke() invocations and onSessionUpdate() invocations may execute concurrently.
  5. When an MTS belongs to a single application, if the invocations are for different sessions, any of these handlers (onSessionEnter(), onSessionUpdate(), onSessionLeave(), and onInvoke()) may execute concurrently with each other. For example, onSessionEnter() for two different sessions may execute concurrently.
  6. onServiceInterrupt() may occur at any time except during onCreateService() or onDestroyService(). Multiple occurrences of onServiceInterrupt() may also execute concurrently.
The following table summarizes which handlers can be executed concurrently within the same session:
Handlers onSessionEnter onSessionUpdate onSessionLeave onInvoke
onSessionEnter No No No No
onSessionUpdate Not applicable No No Yes
onSessionLeave Not applicable Not applicable No No
onInvoke Not applicable Not applicable No Yes

The following table summarizes which handlers can be executed concurrently for different sessions of the same application.

Handlers onSessionEnter onSessionUpdate onSessionLeave onInvoke
onSessionEnter Yes Yes Yes Yes
onSessionUpdate   Not applicable Yes Yes
onSessionLeave Not applicable Not applicable Yes Yes
onInvoke Not applkicable Not applicable Not applicable Yes

Feature interactions

Reclaim and preemption

The application-level MTS supports session preemption where only the running services of the preempted sessions are interrupted. The preemption grace period allows the currently running service instance to complete and clean up when the resource on which the service instance is running is reclaimed. If the service method and cleanup do not complete within the set time, Symphony terminates the instance. If the timeout has not expired, Symphony initiates cleanup after the currently running service method completes.

Global standby service

An MTS process becomes a linger service instance if there is only one service driver (last service driver) in the MTS. After that, the MTS is actually a normal linger service instance occupying one resource unit, as it has only one service driver. It can be transferred over to another SIM normally.

Delay slot release

When a SIM becomes idle, it stays connected to the MTS while it is waiting for delaySlotRelease to expire; if it expires, the MTS thread runs to completion, the SIM disconnects, and the slot is released.

Service to slot ratio

With the service-to-slot-ratio feature, the workload consumes slots according to its own slot usage requirement. In MTS mode, the only difference is that there is one thread per concurrent task, so the thread consumes N slots or 1/N of a slot.

Resource preference

The resource preference feature is not supported with MTS.

Minimum services and maximum services

In MTS mode, minServices and maxServices control the number of threads rather than service instances that are created to run tasks for sessions.

Best practices

Standby services
If a standby service is configured in the cluster, it continues to run on the compute host even though no slots are consumed. Since it is possible for the MTS to hold a lot of memory on the host, it is not recommended to use standby services with MTS.
Exclusive allocation
Exclusive allocation maximizes the benefit of using MTSs on a host.

Without exclusive allocation, multiple MTSs (each one belonging to a different application) may run on the same host. The memory on the host must be shared across multiple applications, decreasing the effective cache size for each application that uses that host.

With exclusive allocation, each host will be used by one application exclusively so the application benefits from a larger in-memory cache.