Topic
7 replies Latest Post - ‏2012-03-25T04:18:01Z by fjb_saper
NickLaqua
NickLaqua
5 Posts
ACCEPTED ANSWER

Pinned topic pub sub vs. queueing

‏2012-03-20T08:10:30Z |
Hi there,

we are in the process to lay down some fundamental foundations for our overall corporate integration architecture. At the highest level, the only two patterns are Service ("PULL"....someone requests an activity to be performed) and Event ("PUSH".....someone notifies others that something significant has happened...normally a data/info change). The assumption is that with these patterns alone, all integration scenarios can be modelled and further detailed (number of messages, message size, frequency, reliability, internet vs. enterprise, trigger, realtime etc.)

The sub pattern "PUSH - Event - high frequency - small message size - single records - reliability" would lead us down the MQ path as the protocol, with the two alternatives of traditional queueing (one-to-one) versus publish subscribe (one-to-many). My understanding is that pub sub is the more decoupled and flexible approach which still can cater for a one-to-one scenario, so we are inclined to specify pub sub as the default solution pattern for the aforementioned requirements.

So the question actually is whether this makes sense and whether there significant disadvantages with oub sub which might make queueing more appropriate in certain situations.

btw, we are on MQ7

thx Nick
Updated on 2012-03-25T04:18:01Z at 2012-03-25T04:18:01Z by fjb_saper
  • SystemAdmin
    SystemAdmin
    8523 Posts
    ACCEPTED ANSWER

    Re: pub sub vs. queueing

    ‏2012-03-20T22:39:23Z  in response to NickLaqua
    MQ is also a very good path for the Service pattern, using a sync or non-sync request/reply messaging design.

    MQ is also suitable for the Push pattern. If there will ever be more than one recipent of a Push event message you should start with a pub/sub design. Its harder to later add on additional recipients in a queued design. Use MQ v7 or later for pub/sub, as the older pub/sub broker in MQ v6 is not as flexible or robust, and MQ v6 goes out of support later this year.

    HTH, G.
    • NickLaqua
      NickLaqua
      5 Posts
      ACCEPTED ANSWER

      Re: pub sub vs. queueing

      ‏2012-03-21T14:26:50Z  in response to SystemAdmin
      thx for the response.

      I agree that pub sub is the way to go. still the question whether there is any disadvantages with pub sub ? As an example,what about guaranteed delivery ? in a queueing scenario, it is very clear, that if messages stay in the queue, they haven't been picked up yet. With pub sub, you might run into a scenario where there is no subscriber to a published message. I guess, if subscriptions are added administratively, this should be similar.

      Nick
      • T.Rob
        T.Rob
        30 Posts
        ACCEPTED ANSWER

        Re: pub sub vs. queueing

        ‏2012-03-21T19:25:24Z  in response to NickLaqua
        Pub/Sub is very much about event-driven architecture rather than one-to-many fan-out, although many people use it for the latter purpose. For event-driven apps, they generally do not care about events that fire while they are offline. Those that do care can use durable subscriptions and that's where the disadvantages begin to pop up. If the durable subscription is infinitely more durable than the app (i.e. the app goes away permanently) then messages begin to pile up in the destination queue. At best, these take up space and slow down publications. At worst if the topic is configured such that any publication must succeed for all subscribers or fail for all subscribers then a full durable subscription can cause the entire topic tree to clog from that point down. So it is necessary to monitor durable subscriptions and their queues to make sure orphans are cleaned up.

        Another issue with pub/sub is when publishing to a clustered topic. There's a limit to the size of your cluster because each new node potentially subscribes to the clustered topics. I would not, for example, advertise topics to my large Production cluster if only a few nodes actually needed to subscribe. Instead I'd probably create an overlapping cluster whose members were the subset that needed pub/sub. Of course, that's a rule of thumb and there are many different requirements that would break it. (The rule, not the thumb.)

        T.Rob
        • NickLaqua
          NickLaqua
          5 Posts
          ACCEPTED ANSWER

          Re: pub sub vs. queueing

          ‏2012-03-22T13:57:46Z  in response to T.Rob
          Hi T.Rob,

          so what's the preferred mechanism to implement one-to-many fan out ? I might have used the term "event" in a rather loose fashion, what I am referring to is "a report about something significant that has happened within the environment". The idea is that business processes etc. can be triggered in a rather decentralized "peer to peer" fashion without having a central coordinator.

          With this definition, I am in doubt whether your statement is correct that event driven apps do not care about events firing when they are offline (offline = unwanted downtime). As an analogy to the human nervous system, this would be like saying that I don't care about the noise that a burglar makes breaking into my house when I am sleeping (being offline). A real example from our business environment (airline) would be a passenger notification system which is hooked into all operational systems and can notify passengers in case of significant events, such as a flight being delayed. If the notification system is temporarily down when this event is fired, it should still notify the passengers when it comes back up (assuming the message hasn't expired yet, e.g. flight has left). The most common scenario is realtime data synchronisation via pub/sub. If those events are missed, and there is no periodic re-distribution, the different databases become inconsistent.

          Regarding your comment around durable subscriptions and apps being retired (with the durable subscriptions being retained), wouldn't this be a common problem in any MOM scenario ? Even with traditional queueing, if the receiver app doesn't "get" the message off the queue anymore, they will queue up and take up space. So I guess, queue monitoring is a must anyway.

          Thanks for your comments, much appreciated.

          Nick
          • T.Rob
            T.Rob
            30 Posts
            ACCEPTED ANSWER

            Re: pub sub vs. queueing

            ‏2012-03-22T16:17:16Z  in response to NickLaqua
            Hi Nick,

            The distinction that I was making is that Pub/Sub is very often used to implement what is really a distribution list. There's nothing particularly wrong with that, especially since it is more convenient to create durable or administrative subs than a traditional distribution list. So if I take a typical point-to-point message flow where the recipient cares about every message and need to fan that out to multiple recipients, then it is a distribution list. One characteristic of this is that the sender probably cares to a greater extent whether the recipients all received the message.

            On the other hand, the classic example of an event-driven app is the stock ticker. When the app starts, it doesn't care what prices were at prior points in time. The prices fire every few seconds so it just picks up in mid-stream with the next event. The sender doesn't care whether every potential subscriber gets the publication or even know who or how many subscribers exist. It cares to a much lesser extent whether messages are received.

            There is some overlap but if we start off making this distinction and design the app from the perspective of a dist list or an event model then when the lines begin to blur it is easier to remember our original intent and make the corresponding design decisions.

            The analogy about sleeping being offline is a good place to start. An event-driven architecture would be a normal alarm. You get the notice if you are in the house (online) whether you are awake or not. When you are away (offline) then you do not hear the alarm. On returning you ascertain the state of the house ("hey, is that our alarm?") and act accordingly. You don't need to check the prior state of the alarm because a siren cycles on and off every second or so. You know the moment you hear it that the event is firing and do not need to check whether it was firing 100 seconds ago to respond. With a distribution list model, you take additional steps to provide persistent notification of events that may not be in real time. For example, the system knows and cares whether the alarm monitoring company got the alert and retries once the phone/power/whatever is restored, even if the alarm is reset. In this case, the fact that the alarm fired at all is a notification, as is the current state of the alarm.

            In the example of the airline notification you said "If the notification system is temporarily down when this event is fired..." but this is a case of the sender being aware that an event did not fire and that is not a distinguishing feature between event-driven vs. dist list. Whether a sender needs to save and send events after an outage is a function of the lifetime of the business value of the notification. When the app comes up are there events queued up that it needs to send or should it clear the queue, ascertain the current state and send events for anything with non-normal parameters? In the first model it is possible that the app wakes up and sends four notices back to back:

            • Your flight is delayed 15 minutes.
            • Your flight is delayed 30 minutes.
            • Your flight is delayed two hours.
            • Your flight is cancelled.

            Arguably such a system would be better off simply ascertaining the current non-normal state (flight cancelled) and sending one text. Certainly subscribers paying per-text would appreciate that model more. Of course, that requires the sender to understand the correlation between two events rather than just distributing blindly whatever events are fed in.

            Because the system described does not correlate events, it must must send all notices (even ones produced while it was offline) in order to know the current state. This is dist list and the architecture is driven more by the the SMS architecture than anything else. If it were event driven, a local app would track the state. On startup it would register a subscription, determine the current state and then monitor for changes. Assuming that it could determine the state quickly (by polling, by using a proprietary feature such as WMQ's retained pubs, or simply because the events fire every few seconds) it would have no need to obtain prior state for the flight. Such a system would not need the robust delivery systems of durable subscriptions and persistent messages and in fact are a sweet spot for MQTT apps. However, such a system requires either a specific running app or else something like an embedded MQTT client that could generically subscribe to notifications from many sources. (At least one mobile company is looking at this, by the way.)

            Also, you are correct that the discussion is generic and applies to any MOM. The JMS spec embodies both P2P and Pub/Sub but without describing any particular functionality for dist list. Perhaps this was because the designers saw dist list as a subset of pub/sub rather than as a different animal. Or possibly just figured that the semantics provided would serve both purposes. Either way, the end result is that the JMS spec became the defacto implementation for dist list. The result is that all transports tend to gracefully handle dynamic pub/sub but detection of orphan subscriptions is inconsistently implemented and with varying degrees of automation. I suspect that if the issue begins to rise in prominence as more people move to pub/sub various vendors (IBM included) will look at ways of improving their handling.
            • T.Rob
            • NickLaqua
              NickLaqua
              5 Posts
              ACCEPTED ANSWER

              Re: pub sub vs. queueing

              ‏2012-03-23T02:38:29Z  in response to T.Rob
              wow, quite some input :-).

              Let me sort this a little bit.

              First comment I wanted to make that (again), I was probably applying a slightly different definition of event (vs. state). Event (in my case) is something that won't be repeated necessarily but is absolutely important for the receiver. The alternative (stock ticker) is more like "state". In this case, messages are fired periodically and it doesn't matter when you plug into the stream as one doesn't care about the history. The lines between the two are kind of blurry as you proved with your two examples. The flight delay topic could also be seen as state as there could (!!!) be regular updates. But in this case, it is not as implicit as with the stock ticker where regular updates are certain. For the case of the alarm and the notification of the security company, this will probably be implemented in a rather durable fashion (like a big red light on the map), so if the security comes back from the toilet, he definitely gets the message. Even if there would be regular updates (such as the alarm is being switched off by the burglar), the original message shouldn't be dismissed but followed up by the officer.

              The other (related) comment I wanted to make which seems to be a contradiction between yourself and myself was the responsibility for the guaranteed delivery. You seemed to see this with the sender/publisher whereby I rather see it the other way. Ultimately, my key focus is decoupling which translates into the sender actually not knowing anything about the consumer. This means that - in order to achieve guaranteed delivery - there needs to be a reliable mediator (e.g. MQ) that is trusted to do the job. In my opinion, all the publisher cares is the fact that something happened within his domain that is worth to share. Who consumes it, what it triggers and any implicit logic of consumption is none of his business. The only model which supports such a concept is pub/sub. Whether one cares about historic events or not then leads to design decisions such as durable vs. dynamic subscriptions. But again, my key focus was on unwanted downtimes, crashes etc. I would assume (not being an MQ expert) that a dynamic subscription would still survive the case of the subscriber crashing. So messages would still pile up as the subscriber hasn't unsubscribed (before crashing). So still monitoring required.

              You raised an interesting aspect around messages that override each other. I would argue that this is dependent on the subscriber rather than the publisher and hasn't anything to do with the transport/delivery/distribution aspect. As an example, there could be a business intelligence system (being fed in realtime) which wants to know ALL flight delay events to apply analytics later. On the other hand, there is the passenger notification system which obviously shouldn't bother the passenger with too many messages. The exact mechanics how to deal with such situations are pretty complex, and the business logic of this is not (necessarily) publisher related or its call/responsibility. As an example, the "Flight Cancelled" event is not necessarily (for a machine) related to the "Flight Delayed" event and mustn't cancel out the other one, for a human being, it certainly is related. There is probably different ways how to manage this situation. For instance, once the app comes back up, it could discard all messages and query the sources ("pull") rather than consuming the messages. But that would again assume that the subscriber system knows all the publishers which leads to tight coupling. Alternatively, one could make it a precondition that downstream actions can only be taken if the queue is empty (in a normal operational situation, this would be true if messages are fetched in real time). So in this case, the application would read all the messages and apply its own logic (e.g. discarding all the flight delay events and only act upon the "Flight Cancelled" event) before acting on the remaining events.

              Another example is a weather service. One could argue that these messages are not critical, but what about a typhoon message (happens quite often in Hongkong) ? This message could be the reason for shutting down the whole airport, with significant effects for the operations. So for the us as subscriber, it is absolutely critical, for a random person, it would be more of an FYI, especially as those messages could also be received via other channels (TV, radio, metro). Now for this specific case, one could argue that the first consumer of the "typhoon message" would not be the passenger notification app (unless you want to let them know that a typhoon is coming), but the flight operations systems. Based on their business logic and human intervention, these systems would eventually take action (e.g. ground planes) and thereby issue multiple "Flight Cancelled" messages/events which are then consumed by different systems (e.g. passenger notification system, reservation system to do automatic reaccommodation).

              So in summary, my key statements are:

              1. Whether a message is important/can be discarded or not, is subject to the subscriber rather than the publisher. This maps to durable vs. dynamic subscriptions
              2. The MQ/MOM layer is only concerned with transport/distribution aspects rather than the content/context of the messages.
              3. Decoupling is key.
              4. Message processing logic primarily sits with the subscribers in the same way that message generation logic (whether or not to execute an activity which leads to an event) sits with the publisher

              Nick
              • fjb_saper
                fjb_saper
                167 Posts
                ACCEPTED ANSWER

                Re: pub sub vs. queueing

                ‏2012-03-25T04:18:01Z  in response to NickLaqua
                I noticed that in his flight analogy, T-Rob also skipped (intentionally or not) the subject of retained publications.

                So let's look again at the analogy

                1-> flight xxx is 30 mins late
                2-> flight xxx is 60 mins late
                3-> flight xxx is 90 mins late
                4-> flight xxx is cancelled

                So you have 2 systems running:
                Flight system publishes a retained publication with topic /flight status/flight xxx

                The notification system has a non durable subscription to /flight status/#
                It in turn will publish the messages out to the public (SMS) if subscribed for the status of flight xxx

                As the notification system goes down there is no need to fear that the public will get all 4 messages. You will only get the last status and any following messages...
                So you might get the last status twice: Once it got originally published and once the notification system came back up (a case of no change)...

                If you want to suppress the duplicate status message the notification system has to keep track of the status itself and verify whether it has changed before sending out the notification...