Powerful IBM i cleanup mechanisms allow application deadlock (cancel_handler and C++ automatic destructors)

The IBM® i operating system provides a set of powerful cleanup mechanisms. In IBM i, an application has the ability to register a cancel handler. Your application can enable a cancel handler by using the #pragma cancel_handler preprocessor statement if it is written in C or C++ or by using the CEERTX() API.

A cancel handler is similar to a Pthread cancellation cleanup handler. However, a cancel handler runs whenever the stack frame or function for which it was registered ends in any way other than a normal return. Pthread cancellation cleanup handlers run only when the thread is terminated with pthread_exit() or pthread_cancel() or when the thread returns from the threads start routine.

The cancel handler is guaranteed to run for all conditions that cause the stack frame to end (other than return), such as thread termination, job termination, calls to exit(), abort(), exceptions that percolate up the stack, and cancel stack frames. Similarly, C++ destructors for automatic C++ objects are guaranteed to run when the stack frame (function) or scope in which it was registered ends.

These mechanisms ensure that your application can always clean up its resources. With the added power of these mechanisms, an application can easily cause a deadlock.

The following is an example of such a problem:

An application has a function foo() that registers a cancel handler called cleanup(). The function foo() is called by multiple threads in the application. The application is ended abnormally with a call to abort() or by system operator intervention (with the ENDJOB *IMMED CL command). When this job is ended, every thread is immediately terminated. When the system terminates a thread by terminating each call stack entry in the thread, it eventually reaches the function foo() in that thread. When function foo() is reached, the system recognizes that it must not remove that function from the call stack without running the function cleanup(), and so the system runs cleanup(). Because your application is multithreaded, all of the job ending and cleanup processing proceeds in parallel in each thread. Also, because abort() or ENDJOB *IMMED was used, the current state and location of each thread in your application is cannot be determined. When the cleanup() function runs, it is very difficult for the application to correctly assume that any specific cleanup can be done. Any resources that the cleanup() function attempts to acquire may be held by other threads in the process, other jobs in the system, or possibly by the same thread running the cleanup() function. The state of application variables or resources that your application manipulates may be in an inconsistent state because the call to abort() or ENDJOB *IMMED asynchronously interrupted every thread in the process at the same time. The application can easily reach a deadlock when running the cancel handlers or C++ destructors.

Do not attempt to acquire locks or resources in cancel handlers or C++ automatic object destructors without preparing for the possibility that the resources cannot be acquired.


Important

Neither a cancel handler nor a destructor for a C++ object can prevent the call stack entry from being terminated, but the termination of the call stack entry (and therefore the job or thread) is delayed until the cancel handler or destructor completes.

If the cancel handler or destructor does not complete, the system does not continue terminating the call stack entry (and possibly the job or thread). The only alternative at this point is to use the WRKJOB CL command (option 20) to end the thread, or the ENDJOB *IMMED CL command. If the ENDJOB *IMMED command causes a cancel handler to run in the first place, the only option left is the ENDJOBABN CL command because any remaining cancel handlers are still guaranteed to run.

The ENDJOBABN CL command is not recommended. The ENDJOBABN command causes the job to be terminated with no further cleanup allowed (application or operating system). If the application is suspended while trying to access certain operating system resources, those resources may be damaged. If operating system resources are damaged, you may need to take various reclaim, deletion, or recovery steps and, in extreme conditions, restart the system.


Recommendations

If you want to cleanup your job or application, you can use one of the following mechanisms:


[ Back to top | Pthread APIs | APIs by category ]