Concurrently running tasks in a single service to share data
Running multiple tasks concurrently in one service instance permits you to easily share data among tasks in the same session or application, and save memory. An application that is configured to run multiple tasks concurrently on a service instance is called a multiple task service (MTS) application. This feature is supported on all operating systems supported by IBM® Spectrum Symphony Advanced Edition. It is not supported on for MapReduce jobs.
Why use MTS?
If you run one task per service instance, it is not convenient for tasks to share data. When tasks run in an MTS application, they share the same process space, which also conserves memory. This differs from an application that uses common data where each task runs on a service instance that still needs its own copy of the common data.
- Common data is 4 GB
- Compute host has 8 GB and eight cores
The client creates a session with a default service-to-slot-ratio and submits eight long-running tasks. Each task has a 250 MB data size.
Without MTS, if you run the eight tasks on the host (one task per core), the memory usage is: (8 * common data size (4 GB)) + (8 * task data size(250 MB)) = 34 GB
The memory requirements are more than four times the amount of physical memory available on the host, which can cause performance issues as the OS constantly swaps memory. Run fewer tasks on the host to avoid this situation.
With MTS, if you run the eight tasks on the host (one task per core), the memory usage is: (1 * common data size (4 GB)) + (8 * task data size(250 MB)) = 6 GB
All eight tasks can run on this host without any performance impact.
When tasks are not submitted continuously, configuring MTS does not provide any significant performance benefits. In this scenario, the MTS is restarted multiple times because often not many tasks exist and the SIM must be released from the current session. To avoid the MTS from being restarted when tasks are not submitted continuously, configure minimum services scheduling in the application profile.
MTS behavior
With MTS, there is only one service instance handling requests for each session or application on each host, depending on the MTS configuration. For example, if you configure an MTS to be associated with a single session, there will be only one MTS dedicated to each session on each host. If you configure an MTS to be associated with a single application, there will be only one MTS to handle workload for the different sessions of the application.
Based on the number of slots allocated to the service, multiple tasks can run concurrently in the MTS instance. Threads are created in the service instance to handle the tasks from the SIM, one thread per task.
One MTS per session
If you configure one MTS per session, initially, only one MTS is started on a host and the SIM on this host connects to the MTS. When workload for the first session comes in, the MTS is associated with that session. When workload for another session comes in, the SIM creates a new MTS. On each host, there is only one service instance handling requests for each session.
Note that on each host, there might be multiple MTS instances for an application. Each one handles the workload for a different session and can process multiple concurrent tasks, depending on the number of slots allocated to it.
One MTS per application
Resource sharing scenario
- Client creates one session and submits 10 tasks at T0
- Client creates another session and submits 10 tasks at T1
- When the tasks are submitted at T0, the SIM creates an MTS.
- When the tasks are submitted from the second session at T1, the second session will be entitled to half of the resources (assuming a proportional policy with equal proportions).
- After the next task completes for the first session, the SSM will re-assign the resource to the second session.
- A new MTS, "MTS 2", will be created and the SIM will connect to "MTS 2". Meanwhile, the previous connections will be closed.
- Steps 3 and 4 are repeated for the next task that finishes for session 1.
Common data updates
In the compute host, the SIM can send common data updates for the session to the MTS. Since common data updates may happen while some tasks are running, it is up to the application to ensure proper synchronization.
Terminating and suspending tasks
If a task is terminated or suspended, the SIM waits for the taskGracePeriod or suspendGracePeriod, respectively, or the effective reclaim grace period (for reclaim) to expire. When a task is terminated or suspended, or the resource associated with the task is reclaimed, the onServiceInterrupt(InterruptEventPtr& event) method is executed on the service side to inform the service about how much time it has to clean up. The InterruptEvent contains the information about the interrupted session or task; refer to the API reference documentation for more information.
If the interrupted task does not return from the onInvoke() method before the grace period expires, the MTS is restarted and the other tasks in the MTS are re-run without incrementing the retry count.
For suspend/reclaim, the interrupted task is re-queued and later re-run without incrementing the retry count.
For task/session termination, the interrupted task is terminated.
Service error handling in MTS mode
Actions that can be taken on an MTS
In MTS mode, multiple tasks run in the same process concurrently. Any interruption to one task may affect other tasks running in the same process. Service error handling enables you to configure timeouts for all methods within the service and actions to take when a timeout occurs. The timeouts and actions are configured using the duration and actionOnSI parameters in the application profile for the methods within the service. The following paragraphs describe service error handling behavior when actions are taken on an MTS instance.
restartService
If an MTS is alive when the SIM needs to restart it, one of the threads in the MTS may request that the service instance be restarted via customized application error handling. For example, onInvoke() throws a FailureException and the configured behavior in the application profile is restartService. In this case, the MTS will be restarted and the other running tasks will be re-run on the new instance without incrementing the task retry counter.
If an MTS exits, the connected SIM will detect the exit almost simultaneously. The SIM enforces the configured error handling behavior for the stage in the service lifecycle that it is executing. For example, if the SIM is executing an onInvoke() and the MTS exits at this time, the SIM will enforce the error handling behavior for Invoke.
If the SIM does not get the response of a command (such as Invoke or sessionEnter) from the MTS within a timeout period and the actionOnSI for the timeout error handling is set to restartService, the SIM follows the same behavior as when the MTS is alive and the SIM needs to restart, as described previously in this section.
blockHost
This action blocks the compute host for the application. Once a SIM informs the SSM to block the compute host, the other SIMs will keep running tasks in the MTS until the other slots on the host are released. This action immediately terminates the MTS process and blocks all slots for that MTS. The host is added to the block list for the SSM allocation. Workload that is impacted is retried without penalty. All resource units that the MTS was using are released.
keepAlive
This action keeps the MTS process alive.
Default application error handling for MTS
<Method name="SessionUpdate">
<Exception type="failure" actionOnSI="restartService" actionOnWorkload="retry"/>
<Exception type="fatal" actionOnSI="restartService" actionOnWorkload="fail"/>
</Method>
Handling behavior of an MTS when a SIM exits
If a SIM exits while its task is still running, the MTS will be restarted and the other running tasks will be re-run on the new instance. Regardless of whether the task of a SIM that has exited is running or not, the task retry counter of affected tasks is not increased.
Supported error handling configurations
MTS only supports a subset of the error handling configurations that are available in non-MTS mode. The following table shows the supported configurations:Method | actOnWK | actOnSI | Failure Exception | Fatal Exception | Timeout | Exit | Return |
---|---|---|---|---|---|---|---|
Register | Not applicable | blockHost | No | No | Yes | Yes | No |
Not applicable | restartService | No | No | Yes | Yes | No | |
CreateService | Not applicable | keepAlive | No | No | No | No | Yes |
Not applicable | blockHost | Yes | Yes | Yes | Yes | Yes | |
Not applicable | restartService | Yes | Yes | Yes | Yes | Yes | |
SessionEnter | succeed | keepAlive | No | No | No | No | Yes |
succeed | blockHost | No | No | No | No | Yes | |
succeed | restartService | No | No | No | No | Yes | |
retry | keepAlive | Yes | Yes | No | No | Yes | |
retry | blockHost | Yes | Yes | Yes | Yes | Yes | |
retry | restartService | Yes | Yes | Yes | Yes | Yes | |
fail | keepAlive | Yes | Yes | No | No | Yes | |
fail | blockHost | Yes | Yes | Yes | Yes | Yes | |
fail | restartService | Yes | Yes | Yes | Yes | Yes | |
SessionUpdate | succeed | keepAlive | No | No | No | No | Yes |
succeed | blockHost | No | No | No | No | Yes | |
succeed | restartService | No | No | No | No | Yes | |
retry | blockHost | Yes | Yes | Yes | Yes | Yes | |
retry | restartService | Yes | Yes | Yes | Yes | Yes | |
fail | blockHost | Yes | Yes | Yes | Yes | Yes | |
fail | restartService | Yes | Yes | Yes | Yes | Yes | |
Invoke | succeed | keepAlive | No | No | No | No | Yes |
succeed | blockHost | No | No | No | No | Yes | |
succeed | restartService | No | No | No | No | Yes | |
retry | keepAlive | Yes | Yes | No | No | Yes | |
retry | blockHost | Yes | Yes | Yes | Yes | Yes | |
retry | restartService | Yes | Yes | Yes | Yes | Yes | |
fail | keepAlive | Yes | Yes | No | No | Yes | |
fail | blockHost | Yes | Yes | Yes | Yes | Yes | |
fail | restartService | Yes | Yes | Yes | Yes | Yes | |
SessionLeave | Not applicable | keepAlive | Yes | Yes | No | No | Yes |
Not applicable | blockHost | Yes | Yes | Yes | Yes | Yes | |
Not applicable | restartService | Yes | Yes | Yes | Yes | Yes | |
DestroyService | Not applicable | Not applicable | No | No | Yes | No | No |
Service API
Concurrent execution and application synchronization
Concurrent execution inside the service instance adheres to the following principles. It is up to the application to ensure proper synchronization.
- onCreateService() will only be called at the beginning of the process lifetime. The middleware will not execute any other handler while onCreateService() is executing.
- onDestroyService() will only be called at the end of the process lifetime. The middleware will not execute any other handler while onDestroyService() is executing.
- For sessions without common data, there are no handlers called to indicate when the session is assigned and unassigned from the service. For a session without common data, onInvoke() invocations may execute concurrently any time after the onCreateService() completes and before onDestroyService() is called.
- For sessions with common data, onSessionEnter() and onSessionLeave() scope the period of time that the service instance is assigned to a particular session.
When an MTS belongs to a single application, it may be assigned and unassigned from a session more than once. MTS may be assigned to multiple sessions at once. The remaining handlers (onSessionEnter(), onSessionUpdate(), onSessionLeave(), and onInvoke()) may execute concurrently within the process under the following rules:
- onSessionEnter() invocations for a session will not execute concurrently with other handlers (onSessionEnter(), onSessionUpdate(), onSessionLeave(), and onInvoke()) for that session.
- onSessionLeave() invocations for a session will not execute concurrently with other handlers (onSessionEnter(), onSessionUpdate(), onSessionLeave(), and onInvoke()) for that session.
- onInvoke() invocations for a session may execute concurrently.
- onSessionUpdate() invocations will occur serially for a session.
- onInvoke() invocations and onSessionUpdate() invocations may execute concurrently.
- When an MTS belongs to a single application, if the invocations are for different sessions, any of these handlers (onSessionEnter(), onSessionUpdate(), onSessionLeave(), and onInvoke()) may execute concurrently with each other. For example, onSessionEnter() for two different sessions may execute concurrently.
- onServiceInterrupt() may occur at any time except during onCreateService() or onDestroyService(). Multiple occurrences of onServiceInterrupt() may also execute concurrently.
Handlers | onSessionEnter | onSessionUpdate | onSessionLeave | onInvoke |
---|---|---|---|---|
onSessionEnter | No | No | No | No |
onSessionUpdate | Not applicable | No | No | Yes |
onSessionLeave | Not applicable | Not applicable | No | No |
onInvoke | Not applicable | Not applicable | No | Yes |
The following table summarizes which handlers can be executed concurrently for different sessions of the same application.
Handlers | onSessionEnter | onSessionUpdate | onSessionLeave | onInvoke |
---|---|---|---|---|
onSessionEnter | Yes | Yes | Yes | Yes |
onSessionUpdate | Not applicable | Yes | Yes | |
onSessionLeave | Not applicable | Not applicable | Yes | Yes |
onInvoke | Not applkicable | Not applicable | Not applicable | Yes |
Feature interactions
Reclaim and preemption
The application-level MTS supports session preemption where only the running services of the preempted sessions are interrupted. The preemption grace period allows the currently running service instance to complete and clean up when the resource on which the service instance is running is reclaimed. If the service method and cleanup do not complete within the set time, Symphony terminates the instance. If the timeout has not expired, Symphony initiates cleanup after the currently running service method completes.
Global standby service
An MTS process becomes a linger service instance if there is only one service driver (last service driver) in the MTS. After that, the MTS is actually a normal linger service instance occupying one resource unit, as it has only one service driver. It can be transferred over to another SIM normally.
Delay slot release
When a SIM becomes idle, it stays connected to the MTS while it is waiting for delaySlotRelease to expire; if it expires, the MTS thread runs to completion, the SIM disconnects, and the slot is released.
Service to slot ratio
With the service-to-slot-ratio feature, the workload consumes slots according to its own slot usage requirement. In MTS mode, the only difference is that there is one thread per concurrent task, so the thread consumes N slots or 1/N of a slot.
Resource preference
The resource preference feature is not supported with MTS.
Minimum services and maximum services
In MTS mode, minServices and maxServices control the number of threads rather than service instances that are created to run tasks for sessions.
Best practices
- Standby services
- If a standby service is configured in the cluster, it continues to run on the compute host even though no slots are consumed. Since it is possible for the MTS to hold a lot of memory on the host, it is not recommended to use standby services with MTS.
- Exclusive allocation
- Exclusive allocation maximizes the benefit of using MTSs on a host.
Without exclusive allocation, multiple MTSs (each one belonging to a different application) may run on the same host. The memory on the host must be shared across multiple applications, decreasing the effective cache size for each application that uses that host.
With exclusive allocation, each host will be used by one application exclusively so the application benefits from a larger in-memory cache.