Topic
  • 9 replies
  • Latest Post - ‏2013-09-12T19:30:03Z by AndrewPaier
Castiel
Castiel
19 Posts

Pinned topic UCA fails after several repetitions of a business process

‏2013-09-04T20:18:59Z |

My business process uses Undercover Agent (UCA, the activity with the icon of an envelope) to accept the response from my testing client (a JAVA application, which sends automatical JSON response to the BPM process using BPM REST API).

For the purposes of measuring time, I want to run this process repeatedly, i. e., at least 150 times. About 15-30 consecutive iterations proceed normally, however, the next repetition "gets stuck" on this UCA. The Inspector in Process Designer claims that the problematic instance is in active state and "eternally" waits for the incoming response.

This line of SystemOut.log may (not) be relevant:

[9/4/13 22:07:37:443 CEST] 00000a0f wle           W   CWLLG0297W: The intermediate event with ID BpdEvent.1204214e-35e7-4b73-bf71-85f9d798ff5a can never receive a message from UCA UCA.7d0d96f1-9691-43af-81f5-b4fe1bc025cc because it is correlated on an invalid output parameter.

To sum up, many instances are perfectly okay yet there is always one that halts the whole time-measurement. What causes this behavior?

  • dogren@gmail.com
    dogren@gmail.com
    424 Posts

    Re: UCA fails after several repetitions of a business process

    ‏2013-09-04T20:52:04Z  

    I think you might need to provide some more details on your use case.

    First, let's clear up some terminology. UCA's are what you send. IMEs/SMEs are what receive the messages and affect the processes. Attached IME's are connected to an activity and unattached IME's "float" on the diagram. (SME's can't be attached, for fairly obvious reasons.)

    My interpretation of your post is that you have one process instance that is receiving all of these UCA notifications with a single unattached IME. After receiving the UCA, the IME continues and loops back and listens again.

    First, are my guesses about your BPD correct? If not (or even if so), can you post an image of your BPD?

    What is your goal with these IMEs? If you are just recording that the event was received (what I'm interpreting "measuring time"), there are better ways to do that.

    Also, what are you using for your correlation ID? Obviously the log message leads me to be a little suspicious of this.

    Are your IMEs marked as durable?

    Finally, how fast are these UCAs being received?

    My first gut suspicion is that you have some sort of race condition happening. Remember that these are being processed asynchronously. There are no guarantees that the UCA messages will be processed in a given amount of time, or even in a particular order. (At least not with the default ASYNC queue.) But more details might help.

    It might also help to add some logging. I often add a log statement in my UCA message handler to log the output variable(s)? There's been lots of times I've wondered why something wasn't correlating only to find that the payload in the UCA wasn't what I thought it was. (Sometimes just an extra space, so watch your whitespace.)

    Similarly, it might be good to log the value of your correlation variable in the IME in a Pre just before the activity. (Or to check it in the inspector after you've noticed a problem.) The same applies here: I've often find something "stuck" only to find that the correlation variable wasn't what I thought it was.

    David

    Updated on 2013-09-04T20:52:36Z at 2013-09-04T20:52:36Z by dogren@gmail.com
  • Castiel
    Castiel
    19 Posts

    Re: UCA fails after several repetitions of a business process

    ‏2013-09-05T12:31:24Z  

    I think you might need to provide some more details on your use case.

    First, let's clear up some terminology. UCA's are what you send. IMEs/SMEs are what receive the messages and affect the processes. Attached IME's are connected to an activity and unattached IME's "float" on the diagram. (SME's can't be attached, for fairly obvious reasons.)

    My interpretation of your post is that you have one process instance that is receiving all of these UCA notifications with a single unattached IME. After receiving the UCA, the IME continues and loops back and listens again.

    First, are my guesses about your BPD correct? If not (or even if so), can you post an image of your BPD?

    What is your goal with these IMEs? If you are just recording that the event was received (what I'm interpreting "measuring time"), there are better ways to do that.

    Also, what are you using for your correlation ID? Obviously the log message leads me to be a little suspicious of this.

    Are your IMEs marked as durable?

    Finally, how fast are these UCAs being received?

    My first gut suspicion is that you have some sort of race condition happening. Remember that these are being processed asynchronously. There are no guarantees that the UCA messages will be processed in a given amount of time, or even in a particular order. (At least not with the default ASYNC queue.) But more details might help.

    It might also help to add some logging. I often add a log statement in my UCA message handler to log the output variable(s)? There's been lots of times I've wondered why something wasn't correlating only to find that the payload in the UCA wasn't what I thought it was. (Sometimes just an extra space, so watch your whitespace.)

    Similarly, it might be good to log the value of your correlation variable in the IME in a Pre just before the activity. (Or to check it in the inspector after you've noticed a problem.) The same applies here: I've often find something "stuck" only to find that the correlation variable wasn't what I thought it was.

    David

    I think you might need to provide some more details on your use case.

    Okay, with pleasure :-)

    First, let's clear up some terminology. UCA's are what you send. IMEs/SMEs are what receive the messages and affect the processes. Attached IME's are connected to an activity and unattached IME's "float" on the diagram. (SME's can't be attached, for fairly obvious reasons.)

    OIC! I don't think I even used SME, just IME for accepting input data. I will often call it simply "the envelope".

    My interpretation of your post is that you have one process instance that is receiving all of these UCA notifications with a single unattached IME. After receiving the UCA, the IME continues and loops back and listens again.

    First, are my guesses about your BPD correct? If not (or even if so), can you post an image of your BPD?

    See the attached printscreen of the BPD - the trouble is in the "envelope" Wait for input (non-too...) 

    We use BPM as a "conductor"/"operator" that controls the whole logic/infrastructure of our application. Now we want to optimize BPM's performance. So as to achieve this, the time-measurements have to be done and that is the purpose of my business process. It tests how different number of levels (Linked Process), how usage of general system sevice and how loading components from external toolkits affect the duration of the whole program. Namely, the command

    log.info("name of component");

    is used as a time-log at interesting parts of the process. There is also a JAVA application called TestClient: it sends automatic input (in form of one identically same JSON message) to BPM (the envelope/the IME to be more precise). This input contains instanceId which, in the envelope, is checked with the instanceId of the belonging instance. This allows more instance with different inputs to run at the same time.

    To sum up, there's one attached IME which is gone through everytime. Yet, sometimes it just "freezes" and waits for the mentioned input.

    What is your goal with these IMEs? If you are just recording that the event was received (what I'm interpreting "measuring time"), there are better ways to do that.

    IME is the obligatory part of the process, since the production version of the application contains and heavily uses IMEs for accepting input data.

    Also, what are you using for your correlation ID? Obviously the log message leads me to be a little suspicious of this.

    The instanceId of the corresponding instance, as mentioned before.

    Are your IMEs marked as durable?

    How does one check this?

    Finally, how fast are these UCAs being received?

    I don't understand this question. What do you mean? Or how can you find this information out?

    My first gut suspicion is that you have some sort of race condition happening. Remember that these are being processed asynchronously. There are no guarantees that the UCA messages will be processed in a given amount of time, or even in a particular order. (At least not with the default ASYNC queue.) But more details might help.

    I had the same suspicion. Then I realized there's only 1 instance running at each time. The BPD sends at the end (the activity "Log before End") an ending message to TestClient. Only after that TestClient exits and only after that TestClient is re-launched (via loop in bash script) and starts a new instance of the BPD.

    Yet, if the queue isn't automatically emptied, this still might be the issue.

    It might also help to add some logging. I often add a log statement in my UCA message handler to log the output variable(s)? There's been lots of times I've wondered why something wasn't correlating only to find that the payload in the UCA wasn't what I thought it was. (Sometimes just an extra space, so watch your whitespace.)

    Yeah, there already are lots of logs. But I might improve it to recognize whitespaces. Thanks, David!

    Similarly, it might be good to log the value of your correlation variable in the IME in a Pre just before the activity. (Or to check it in the inspector after you've noticed a problem.) The same applies here: I've often find something "stuck" only to find that the correlation variable wasn't what I thought it was.

    Again, already there - the activity "Log instanceId".

    I might add an (ir)relevant curiosity in BPM's log SystemOut.log: from time to time (not always), the UCA's log precedes the log of this "Log instanceId" activity, although the order in BPD is the other way round.

     
  • dogren@gmail.com
    dogren@gmail.com
    424 Posts

    Re: UCA fails after several repetitions of a business process

    ‏2013-09-05T17:58:14Z  
    • Castiel
    • ‏2013-09-05T12:31:24Z

    I think you might need to provide some more details on your use case.

    Okay, with pleasure :-)

    First, let's clear up some terminology. UCA's are what you send. IMEs/SMEs are what receive the messages and affect the processes. Attached IME's are connected to an activity and unattached IME's "float" on the diagram. (SME's can't be attached, for fairly obvious reasons.)

    OIC! I don't think I even used SME, just IME for accepting input data. I will often call it simply "the envelope".

    My interpretation of your post is that you have one process instance that is receiving all of these UCA notifications with a single unattached IME. After receiving the UCA, the IME continues and loops back and listens again.

    First, are my guesses about your BPD correct? If not (or even if so), can you post an image of your BPD?

    See the attached printscreen of the BPD - the trouble is in the "envelope" Wait for input (non-too...) 

    We use BPM as a "conductor"/"operator" that controls the whole logic/infrastructure of our application. Now we want to optimize BPM's performance. So as to achieve this, the time-measurements have to be done and that is the purpose of my business process. It tests how different number of levels (Linked Process), how usage of general system sevice and how loading components from external toolkits affect the duration of the whole program. Namely, the command

    log.info("name of component");

    is used as a time-log at interesting parts of the process. There is also a JAVA application called TestClient: it sends automatic input (in form of one identically same JSON message) to BPM (the envelope/the IME to be more precise). This input contains instanceId which, in the envelope, is checked with the instanceId of the belonging instance. This allows more instance with different inputs to run at the same time.

    To sum up, there's one attached IME which is gone through everytime. Yet, sometimes it just "freezes" and waits for the mentioned input.

    What is your goal with these IMEs? If you are just recording that the event was received (what I'm interpreting "measuring time"), there are better ways to do that.

    IME is the obligatory part of the process, since the production version of the application contains and heavily uses IMEs for accepting input data.

    Also, what are you using for your correlation ID? Obviously the log message leads me to be a little suspicious of this.

    The instanceId of the corresponding instance, as mentioned before.

    Are your IMEs marked as durable?

    How does one check this?

    Finally, how fast are these UCAs being received?

    I don't understand this question. What do you mean? Or how can you find this information out?

    My first gut suspicion is that you have some sort of race condition happening. Remember that these are being processed asynchronously. There are no guarantees that the UCA messages will be processed in a given amount of time, or even in a particular order. (At least not with the default ASYNC queue.) But more details might help.

    I had the same suspicion. Then I realized there's only 1 instance running at each time. The BPD sends at the end (the activity "Log before End") an ending message to TestClient. Only after that TestClient exits and only after that TestClient is re-launched (via loop in bash script) and starts a new instance of the BPD.

    Yet, if the queue isn't automatically emptied, this still might be the issue.

    It might also help to add some logging. I often add a log statement in my UCA message handler to log the output variable(s)? There's been lots of times I've wondered why something wasn't correlating only to find that the payload in the UCA wasn't what I thought it was. (Sometimes just an extra space, so watch your whitespace.)

    Yeah, there already are lots of logs. But I might improve it to recognize whitespaces. Thanks, David!

    Similarly, it might be good to log the value of your correlation variable in the IME in a Pre just before the activity. (Or to check it in the inspector after you've noticed a problem.) The same applies here: I've often find something "stuck" only to find that the correlation variable wasn't what I thought it was.

    Again, already there - the activity "Log instanceId".

    I might add an (ir)relevant curiosity in BPM's log SystemOut.log: from time to time (not always), the UCA's log precedes the log of this "Log instanceId" activity, although the order in BPD is the other way round.

     

    Let me start with what I think most be the most helpful in answering your original question, and then follow up with some answers to your questions and then some additional suggestions.

    When I asked whether the IMEs were durable, you didn't know what that meant. And when I asked about the logs you noted that sometimes the UCA log precedes the "log instance id" log statement. (Which is essentially the point where the BPD starts waiting.) This very well might be the key point. I'm not entirely sure how your test client is generating these instances, but it sounds like it might be doing something along the lines of starting the BPD via API, followed by sending the UCA message. (Perhaps JMS, HTTP or some other way. Doesn't matter.)

    As I mentioned before, the BPMN engine is inherently asynchronous. And it doesn't guarantee order except in cases where you specifically configure it to (and when you do configure it that way, there is a performance cost). So what is sometimes happening is that your test client is creating new instance, and a bpdNotification event is getting putting the queue for the engine to process (I'm oversimplifying here a bit, but the gist is that the BPMN engine has some work to do before it gets to the "wait for input" step.) Shortly thereafter the test client is sending the UCA message and the BPMN engine will do some work (notably the message handler and the correlation).

    These will be processed off of two different work queues and likely by two different threads. If the UCA gets processed first it will say "hey are there any IME/SMEs that are correlated to my UCA and instance ID 1234?" And the system will say, no, there is nothing correlated to that. (Because the BPMN engine hasn't gotten the instance to the wait point yet.) So it will ignore that UCA: since there was nothing waiting for it.

    This is correct behavior, because sometimes you don't want to pay attention to any events that might have happened in the past. But sometimes you do care. Sometimes when you have an IME, you mean: "correlate on any event that happens in the future for this key, but also correlate on any event that has already happened." This is called being durable, coming from the fact that the event history is kept in perpetuity (side note: you probably want to clear this from time to time). It is a simple checkbox on the property sheet for the IME. (Don't even get me started on what consumable means, it is not what people think and I've never really seen a use case for it.)

    So, it sounds like a reasonable theory is that the root of your problem might just be that you want durable messages, but haven't turned that on. The reason IMEs were waiting forever for a UCA message was because the UCA had already been received and ignored.

    In answer to some of your other questions:

    What do you mean [by how fast are the UCAs received]? Or how can you find this information out?

    Your BPD is different than I thought. I thought you had one IME that was receiving a message, then looping back and waiting for another. My concern was that you were sending many messages very quickly to the same IME. This gets a little too complicated to get into if it isn't germane to your problem, but there are some weird conditions that can happen when "duplicate" events are being received quickly especially in old versions. I'm not doing a good job explaining this, but I don't want to get too far into the detail when this post is already too long.

    Yet, if the queue isn't automatically emptied, this still might be the issue.

    I'm not sure what you mean by this, just for the record.

     

    Getting back to general advice: you need to look into autotracking. Because you know all of those log statements you have in your process? The system is already taking log measurements in those places. Ones that are automatically correlated and calculated for you. In fact, when it comes to the idea of monitoring and tracking process performance, there is just an enormous wealth of tools available to you out of the box. It seems like you might be re-inventing a wheel here.

    Also, and I know I'm going up on a soapbox here when it is probably too late, but I'm not sure if what you are doing is actually a good approach. Anytime anybody says "We use BPM as a "conductor"/"operator" that controls the whole logic/infrastructure of our application" to me, I immediately get chills. I don't really know what your use case is, but when people try to build "applications" with BPM, usually it doesn't end up working out too well. BPM is for implementing processes. And, yes, the line between "process" and "application" can be arbitrary and fuzzy, but the fact that you have one big system swimlane with sixteen activities in it also triggers big alarm bells in my mind. If this is just because it is your test harness, then this is perfectly acceptable. 

    Also, I worry a little bit that you are trying to "optimize the performance" of what look like a bunch of system tasks. (Which they very well might not be, since we can't see into the subprocesses.) Remember: the engine is entirely asynchronous. Measuring wall clock time won't be valuable at all.

    David

  • AndrewPaier
    AndrewPaier
    842 Posts

    Re: UCA fails after several repetitions of a business process

    ‏2013-09-11T14:08:34Z  
    • Castiel
    • ‏2013-09-05T12:31:24Z

    I think you might need to provide some more details on your use case.

    Okay, with pleasure :-)

    First, let's clear up some terminology. UCA's are what you send. IMEs/SMEs are what receive the messages and affect the processes. Attached IME's are connected to an activity and unattached IME's "float" on the diagram. (SME's can't be attached, for fairly obvious reasons.)

    OIC! I don't think I even used SME, just IME for accepting input data. I will often call it simply "the envelope".

    My interpretation of your post is that you have one process instance that is receiving all of these UCA notifications with a single unattached IME. After receiving the UCA, the IME continues and loops back and listens again.

    First, are my guesses about your BPD correct? If not (or even if so), can you post an image of your BPD?

    See the attached printscreen of the BPD - the trouble is in the "envelope" Wait for input (non-too...) 

    We use BPM as a "conductor"/"operator" that controls the whole logic/infrastructure of our application. Now we want to optimize BPM's performance. So as to achieve this, the time-measurements have to be done and that is the purpose of my business process. It tests how different number of levels (Linked Process), how usage of general system sevice and how loading components from external toolkits affect the duration of the whole program. Namely, the command

    log.info("name of component");

    is used as a time-log at interesting parts of the process. There is also a JAVA application called TestClient: it sends automatic input (in form of one identically same JSON message) to BPM (the envelope/the IME to be more precise). This input contains instanceId which, in the envelope, is checked with the instanceId of the belonging instance. This allows more instance with different inputs to run at the same time.

    To sum up, there's one attached IME which is gone through everytime. Yet, sometimes it just "freezes" and waits for the mentioned input.

    What is your goal with these IMEs? If you are just recording that the event was received (what I'm interpreting "measuring time"), there are better ways to do that.

    IME is the obligatory part of the process, since the production version of the application contains and heavily uses IMEs for accepting input data.

    Also, what are you using for your correlation ID? Obviously the log message leads me to be a little suspicious of this.

    The instanceId of the corresponding instance, as mentioned before.

    Are your IMEs marked as durable?

    How does one check this?

    Finally, how fast are these UCAs being received?

    I don't understand this question. What do you mean? Or how can you find this information out?

    My first gut suspicion is that you have some sort of race condition happening. Remember that these are being processed asynchronously. There are no guarantees that the UCA messages will be processed in a given amount of time, or even in a particular order. (At least not with the default ASYNC queue.) But more details might help.

    I had the same suspicion. Then I realized there's only 1 instance running at each time. The BPD sends at the end (the activity "Log before End") an ending message to TestClient. Only after that TestClient exits and only after that TestClient is re-launched (via loop in bash script) and starts a new instance of the BPD.

    Yet, if the queue isn't automatically emptied, this still might be the issue.

    It might also help to add some logging. I often add a log statement in my UCA message handler to log the output variable(s)? There's been lots of times I've wondered why something wasn't correlating only to find that the payload in the UCA wasn't what I thought it was. (Sometimes just an extra space, so watch your whitespace.)

    Yeah, there already are lots of logs. But I might improve it to recognize whitespaces. Thanks, David!

    Similarly, it might be good to log the value of your correlation variable in the IME in a Pre just before the activity. (Or to check it in the inspector after you've noticed a problem.) The same applies here: I've often find something "stuck" only to find that the correlation variable wasn't what I thought it was.

    Again, already there - the activity "Log instanceId".

    I might add an (ir)relevant curiosity in BPM's log SystemOut.log: from time to time (not always), the UCA's log precedes the log of this "Log instanceId" activity, although the order in BPD is the other way round.

     

    Some more thoughts to add to David's -

    Are there any errors in the logs besides what you are seeing in from the Inspector? I'm wondering if you have checked if the DB configuration will support the number of events you are trying to get through the pipe.  OOTB some of the Datasources are misconfigured with 10 connections maximum.  This does not align properly with the thread allocation for the Event processing engine, which under a stress test can cause a failure since processing of BPD / IME / UCA items all require DB connections.

    Like David, I'm concerned about what I'm seeing in your diagram.  It does feel like we may be using the wrong tool here for what you are doing, but since we are only looking at one tiny portion (I hope) of what you are doing that may not be the case.

    Andrew Paier | Director | BP3 Global, Inc.
    BP3 Global's |  Website  |  Twitter  |  Linkedin  |  Google+  | Blogs

  • mfasbinder
    mfasbinder
    3 Posts

    Re: UCA fails after several repetitions of a business process

    ‏2013-09-11T16:16:14Z  

    David,

    >there are some weird conditions that can happen when "duplicate" events are being received quickly especially in old versions

    Can you expand on this or point to any documentation?

     

    Thanks!

  • dogren@gmail.com
    dogren@gmail.com
    424 Posts

    Re: UCA fails after several repetitions of a business process

    ‏2013-09-11T17:26:53Z  

    David,

    >there are some weird conditions that can happen when "duplicate" events are being received quickly especially in old versions

    Can you expand on this or point to any documentation?

     

    Thanks!

    Hey Marc,

    So, the thing that I was directly referring to was the behavior of durable UCAs when multiple messages with duplicate keys are received. (It's a little weird: even though it is the IME that is marked "durable", it is really the UCA behavior that is modified.)

    The key point is actually mentioned in the documentation : "When a message arrives before a process has run to a point where the event can accept the message, the durable subscription causes the message to be stored until the message event is reached. Only the most recently received message is stored." The tricky bit is "only the most recently received message is stored". 

    Let me use an simple example to start with:

    We have a  BPD that is managing a "procure to pay" process. As part of that, an integration to the ERP system was built and the ERP system is monitored and corresponding UCA events are generated whenever invoices are processed in the ERP system.

    Our process is therefore listening for UCA events from a ERP system that is processing invoices. We have an single IME that is listening with a correlation ID of the invoice number. Once it receives the UCA it opens up the payload and follows a decision gateway for the type of invoice event: "invoice received", "invoice approved", "invoice paid", for example, and then does the correct thing based on that event. Once the BPD processes that payload the BPD loops back to the IME and starts listening again. (Assumably until we get the "invoice paid" or we get to some other end condition.)

    You might think that if you have this type of architecture, and mark the IME as durable that you will be OK: that you will always receive all three events: "received", "approved", and "paid". But that is not the case. If, for example, all three messages are received from the ERP system before we start listening the three messages will overwrite each other and only the last message received will ever be seen by the process.

    So this BPD ends up with a subtle race condition bug: there will be times we never process the "invoice approved" logic if the "invoice paid" message is received too quickly. There are several ways to workaround this, but I'll follow up on that if necessary.

    Another simpler case that is easy to test is to create a single activity with a "do not close" checkbox IME attached to it. And then fire 1000 messages that correlate to that IME as fast as you can from a test client. You will likely not get a 1000 executions of the logic triggered by the IME because some of those 1000 UCA messages will overwrite each other before the IME has a chance to fire.

    This behavior was essentially considered "works as designed" for a long time. However, I have a strong recollection of this behavior being changed at some point in the 8.x line, at which point the behavior became customizable. But, now, I can't seem to find any reference to that changed behavior. Maybe someone else can either find the option to change this behavior or tell me that I'm crazy.

    David

    Updated on 2013-09-11T17:30:36Z at 2013-09-11T17:30:36Z by dogren@gmail.com
  • mfasbinder
    mfasbinder
    3 Posts

    Re: UCA fails after several repetitions of a business process

    ‏2013-09-11T17:59:09Z  

    Hey Marc,

    So, the thing that I was directly referring to was the behavior of durable UCAs when multiple messages with duplicate keys are received. (It's a little weird: even though it is the IME that is marked "durable", it is really the UCA behavior that is modified.)

    The key point is actually mentioned in the documentation : "When a message arrives before a process has run to a point where the event can accept the message, the durable subscription causes the message to be stored until the message event is reached. Only the most recently received message is stored." The tricky bit is "only the most recently received message is stored". 

    Let me use an simple example to start with:

    We have a  BPD that is managing a "procure to pay" process. As part of that, an integration to the ERP system was built and the ERP system is monitored and corresponding UCA events are generated whenever invoices are processed in the ERP system.

    Our process is therefore listening for UCA events from a ERP system that is processing invoices. We have an single IME that is listening with a correlation ID of the invoice number. Once it receives the UCA it opens up the payload and follows a decision gateway for the type of invoice event: "invoice received", "invoice approved", "invoice paid", for example, and then does the correct thing based on that event. Once the BPD processes that payload the BPD loops back to the IME and starts listening again. (Assumably until we get the "invoice paid" or we get to some other end condition.)

    You might think that if you have this type of architecture, and mark the IME as durable that you will be OK: that you will always receive all three events: "received", "approved", and "paid". But that is not the case. If, for example, all three messages are received from the ERP system before we start listening the three messages will overwrite each other and only the last message received will ever be seen by the process.

    So this BPD ends up with a subtle race condition bug: there will be times we never process the "invoice approved" logic if the "invoice paid" message is received too quickly. There are several ways to workaround this, but I'll follow up on that if necessary.

    Another simpler case that is easy to test is to create a single activity with a "do not close" checkbox IME attached to it. And then fire 1000 messages that correlate to that IME as fast as you can from a test client. You will likely not get a 1000 executions of the logic triggered by the IME because some of those 1000 UCA messages will overwrite each other before the IME has a chance to fire.

    This behavior was essentially considered "works as designed" for a long time. However, I have a strong recollection of this behavior being changed at some point in the 8.x line, at which point the behavior became customizable. But, now, I can't seem to find any reference to that changed behavior. Maybe someone else can either find the option to change this behavior or tell me that I'm crazy.

    David

    David,

    Thank you for your reply.  I believe I may have exactly the situation you outline in your Procure to Pay process.  The IME is indeed durable, and as you say, one would expect it to store all messages so they can all be processed.  So we will need to look at possible workarounds.

    Given that the BPD has to be at point where it can accept the message, one possible solution I can imagine might be to have a split right after the IME, so that another token is immediately created for the IME so that it can accept the next message.  Instead of looping back, the other path would have to end.  However, this type of workaround obscures the business logic of the BPD.

    Any other suggestions?

     

    Thanks!

  • dogren@gmail.com
    dogren@gmail.com
    424 Posts

    Re: UCA fails after several repetitions of a business process

    ‏2013-09-11T21:07:34Z  

    David,

    Thank you for your reply.  I believe I may have exactly the situation you outline in your Procure to Pay process.  The IME is indeed durable, and as you say, one would expect it to store all messages so they can all be processed.  So we will need to look at possible workarounds.

    Given that the BPD has to be at point where it can accept the message, one possible solution I can imagine might be to have a split right after the IME, so that another token is immediately created for the IME so that it can accept the next message.  Instead of looping back, the other path would have to end.  However, this type of workaround obscures the business logic of the BPD.

    Any other suggestions?

     

    Thanks!

    So, I'm not sure your approach would work. Because there are no guarantees for how fast the bpdNotification would get processed. Plus, what would happen if all of the events were received before you even got to the active waiting point.

    So my first recommended "workaround" would be to figure out if I'm right about this getting changed. Adam (since I know you know who I mean) always considered this a bug and I'm pretty sure he got it "fixed". (Let me look again.) Anyway, one of the reasons I remember this was because screwed up a toolkit Adam built, if it were run under high load.

    I "production hardened" his toolkit by writing to a database in the message handler. The message handler for the UCA always runs, even with duplicates (it pretty much has to, since the correlation key doesn't even exist until after the output of the service). So my service ahdnler would essentially write out all of the interesting bits of the payload into a database table. Then, when the IME was triggered, the BPD would read it back from the database. 99.9% of the time there would be only one record and it could just do the one thing it needed. But, just in case, the BPD would iterate over all of the rows it got back (it obviously filtered the resultset by the correlationid) and do the right thing for each.

    I don't like that approach, since it kind of defeats the whole purpose of letting the UCA and IME be abstracted from each other, not to mention all of the downsides of serializing that data back and forth from my custom database table. But if the product hasn't changed, it should work.

    David

  • AndrewPaier
    AndrewPaier
    842 Posts

    Re: UCA fails after several repetitions of a business process

    ‏2013-09-12T19:30:03Z  

    Hey Marc,

    So, the thing that I was directly referring to was the behavior of durable UCAs when multiple messages with duplicate keys are received. (It's a little weird: even though it is the IME that is marked "durable", it is really the UCA behavior that is modified.)

    The key point is actually mentioned in the documentation : "When a message arrives before a process has run to a point where the event can accept the message, the durable subscription causes the message to be stored until the message event is reached. Only the most recently received message is stored." The tricky bit is "only the most recently received message is stored". 

    Let me use an simple example to start with:

    We have a  BPD that is managing a "procure to pay" process. As part of that, an integration to the ERP system was built and the ERP system is monitored and corresponding UCA events are generated whenever invoices are processed in the ERP system.

    Our process is therefore listening for UCA events from a ERP system that is processing invoices. We have an single IME that is listening with a correlation ID of the invoice number. Once it receives the UCA it opens up the payload and follows a decision gateway for the type of invoice event: "invoice received", "invoice approved", "invoice paid", for example, and then does the correct thing based on that event. Once the BPD processes that payload the BPD loops back to the IME and starts listening again. (Assumably until we get the "invoice paid" or we get to some other end condition.)

    You might think that if you have this type of architecture, and mark the IME as durable that you will be OK: that you will always receive all three events: "received", "approved", and "paid". But that is not the case. If, for example, all three messages are received from the ERP system before we start listening the three messages will overwrite each other and only the last message received will ever be seen by the process.

    So this BPD ends up with a subtle race condition bug: there will be times we never process the "invoice approved" logic if the "invoice paid" message is received too quickly. There are several ways to workaround this, but I'll follow up on that if necessary.

    Another simpler case that is easy to test is to create a single activity with a "do not close" checkbox IME attached to it. And then fire 1000 messages that correlate to that IME as fast as you can from a test client. You will likely not get a 1000 executions of the logic triggered by the IME because some of those 1000 UCA messages will overwrite each other before the IME has a chance to fire.

    This behavior was essentially considered "works as designed" for a long time. However, I have a strong recollection of this behavior being changed at some point in the 8.x line, at which point the behavior became customizable. But, now, I can't seem to find any reference to that changed behavior. Maybe someone else can either find the option to change this behavior or tell me that I'm crazy.

    David

    David -

    You are correct that was the behavior of the product.  And for a long time I thought it was WAD and had a really complex series of explanations to tell people why I thought it was WAD, having to do with emails, and lunch breaks, etc. for the explanation.

    Then a developer decided really was a bug, even though the product behaved that way for years and numerous customers complained and I kept trotting out my totally made up reasoning.  Go figure.  At any rate, it was supposed to be solved, but I have no idea how to go and prove if it was or was not solved without an extensive test.

    That being said, when one ran into the above case in the past, the actual behavior is there would be N items listed in the underlying table waiting to be processed and when the Event Manager picked them up, it would process the most recent, but mark all N as having been processed.  This means there was a mechanism for telling this had occurred.  

    Marc -

    Your split solution will not be reliable as the processing of tokens in a BPD is singly threaded and non-deterministic.  So if the split decided to move the token back to the IME first then yes, you would greatly decrease the odds of "missing" the events.  However if it chose to follow the other path first, then the window would be ~ the same and you would be no better off.

    Andrew Paier | Director | BP3 Global, Inc.
    BP3 Global's |  Website  |  Twitter  |  Linkedin  |  Google+  | Blogs