Passive Nature of Transaction Timeouts
Hello everyone!! I am Lalitha Chandran and a member of the WebSphere Process Server technical support team and in this blog I will be sharing some of the interesting (in my opinion :-)) observations I have made.
To begin with, here's a classic problem with a transaction timeout. I was given a project interchange (PI) file, in which a microflow invokes a webservice that takes an hour to respond, but the microflow does not abort after the timeout. That's all it takes to get me started, a project interchange file (PI).I first did my usual PI screening:
Step 01: Identify the Components involved:
- Microflow->WS Import
- Microflow->POJO1 (with a Thread.sleep 30 minutes)
- WS export->POJO2 (with a Thread.sleep 1 hour)
Step 02: Understand how the components are being invoked:
- Microflow -synchronous->POJO1
- Microflow -synchronous->POJO2
Step 03: Review transactions qualifiers settings: Microflow running in a global transaction, POJO1 and POJO2 in their own local transaction
Step 04: Prerequisites: I ensured the transaction timeout is set to the default, 2 minutes.
With this I was all set to do my first test on the microflow. I started my test component and fired a request. After two minutes, I saw an information message being dumped to SystemOut.log file, so far so good.
0000000f TimeoutManage I WTRN0006W: Transaction xxx has timed out after 120 seconds.
The microflow kept waiting for the webservice to return. After an hour when the webservice returned, I expected it to abort right? Nope, no exceptions, it went ahead and called POJO2 as if nothing happened,
which was surprising. But once the microflow received the response from POJO2 and was about to end, I see an error message saying transaction ended due to timeout, transaction will be rolled back, yeah right!!! Why didn't it happen 88 minutes ago, when it actually timed out? See Figure 1 below:
Note: In my actual test I reduced the sleep time not to waste too much time testing :-).
Reading the Bible:
I started googling on this topic. The results were quite interesting. All of the above behavior is by design (no defects). This is the summary of what I found:
When the "total transaction timeout" time expires, the timeout mechanism simply records the occurrence of the timeout in the transaction itself - it does not cause the thread running the transaction to stop processing. When the processing path passes through the next transactional boundary (for example, when entering or leaving an EJB method), a check is performed to see if the transaction has been marked as rollback and, if so, a rollback is issued.
Remember microflow always takes place in one single global transaction and transaction boundary is encountered only when the microflow ends. In our case it makes sense that the microflow did not rollback until it ended. Bottom line -- transaction won't abort right after the timeout. This is not very intuitive of the word "timeout". I can imagine problems if this is not understood and tested well.
Quick example (The business driver):
What would be the impact of such a behavior? Lets take an example where we have a microflow that processes phone purchase orders. A customer places an order, the microflow does the processing, interacts with the database and sends back an order number to customer. Typically this would take about few seconds. But lets say it is taking a long time (for reasons like slow database response or high request volume) and instead now it takes 15 minutes.
In this case the transaction would time out at 2 minutes (default setting) but the microflow will continue its processing. At the end of 15 minutes when microflow is ready to send the order number to client , it tries to commit, but can't as the transaction has been marked for rollback due to timeout.
As a result the customer, after waiting for 15 minutes gets a failure message instead of the order number! Not expected by some, not desired by others. See Figure 2 below:
Figure 2 (click to enlarge)
If you have faced this before, I am sure you might be wondering what the solution is. Well it is really not a "problem", so the best you can do is to avoid this situation. What I mean is, do not let the code depend on transaction timeout, especially in time sensitive scenarios. Use other active timeouts available. See the following links:
Timeout technote for WebSphere Application Server
Information center link
BPEL timeoutsSCA Timeouts
I have attached the project interchange zip file below that I used, feel free to download it and try it. Best way to learn a product, use it! I would love to hear if you found this piece of information interesting as well.