Topic
3 replies Latest Post - ‏2012-12-11T08:51:07Z by mmalc
SystemAdmin
SystemAdmin
289 Posts
ACCEPTED ANSWER

Pinned topic Architecture question

‏2012-10-31T08:03:48Z |
We have a legacy system that produces files that each contains hundreds of messages (financial transactions). We need to transform these messages into another format and submit them (individually) to a target system. The question is: Should ESB accept these large files for processing directly, or should there be an adapter application between the legacy system and ESB that would split received files into individual messages and let the ESB to process the messages individually (instead of processing the whole file)?

In the first solution we expect two ESB flows. The first one would transform the file into a new format, split it into the messages, and store these messages into a temporary location (eg. a database). The transformation needs to process the file as a whole, because the file contains some common sections that are needed for transformation of the individual messages. The second flow would take the individual transformed messages (each in a separate XA transaction), pass them to the target system, and wait for its answer (synchronously or asynchronously).

The second solution would replace the first flow by an external application that would transform the file, split it into individual transformed messages, and store them in a temporary location (local file system). The second flow would stay in the ESB.

In our eyes, the disadvantage of the first solution is in that the ESB would have to process huge files (in the first flow), which is commonly considered an anti-pattern. On the other hand, the ESB would adjust directly to the interface of the legacy system, which is one of the purposes of ESB.

In the second solution, the adapter application would contain the transformation logic, which should be another of the purposes and responsibilities of ESB.

What is the commonly suggested solution for this situation (a pattern)? Which solution would you suggest?

http://publib.boulder.ibm.com/infocenter/esbsoa/wesbv7r5/index.jsp?topic=%2Fcom.ibm.websphere.wesb.programming.doc%2Ftopics%2Fesbprog_patterns.html

https://www.ibm.com/developerworks/wikis/display/esbpatterns/File+Processing

http://www.ibm.com/developerworks/webservices/library/ws-largemessaging/
Updated on 2012-12-11T08:51:07Z at 2012-12-11T08:51:07Z by mmalc
  • mmalc
    mmalc
    74 Posts
    ACCEPTED ANSWER

    Re: Architecture question

    ‏2012-11-06T21:30:08Z  in response to SystemAdmin
    Interesting question. The first thing I'd say is that not everything is solved using an ESB, in the same way that not ever DIY problem is solved with a hammer.
    Thee are some tools that will likely solve this a little more elegantly however this would result in additional cost so the there are tradeoffs that you'd need to consider.

    If the ESB approach is to be used then I'd suggest a two stage process as you are right, ingesting a large file into the ESB and maintaining it in memory is not recommended. Without know the details and talking generically, I'd look to the first pass to extract the common pieces of data and storing them in a separate space from the line items. The second pass would then read in the common data and use it to process the line items. Both of these could potentially be performed in the ESB, it depends on the complexity of the file.

    Whether you store the data in a database or another file is down to your analysis of the requirements.
    For example you mention XA transaction, I'm assuming this means you have multiple resource managers that you need to coordinate in the same transaction.

    Another thing to bear in mind is how much time do you have to process the incomming file. If the file is so large that the two stage process takes so long it interferes with possible real-time operations, or it another file arrives before you have finished processing the first, then you may need to consider other options.

    An interesting option would be to use large scale caching technology to support this process... but that's another story.
    • SystemAdmin
      SystemAdmin
      289 Posts
      ACCEPTED ANSWER

      Re: Architecture question

      ‏2012-11-07T08:12:28Z  in response to mmalc
      Mmalc, thank you for your response.

      mmalc wrote:
      Thee are some tools that will likely solve this a little more elegantly however this would result in additional cost so the there are tradeoffs that you'd need to consider.

      I would choose a different solution, but our common agreement in our team after considering all the pros and contras was the choice between the two mentioned possibilities.
      I myself would pass only the uri of the big file through the ESB flow, leaving the file itself on a shared disk (SFTP?, as we don't have a shared disk array). At a right time, WESB would call a java component that would download the file, transform it and split it into individual messages, and then the WESB would store the messages to the database for further processing. For huge files, this first part of the flows would be performed on a dedicated server so that it doesn't block processing of small files.

      Have you meant another tools in what you've written?

      The other side of the coin is that we are tight on HW resources, as our customer has WESB licence for only a very limited number of CPUs. Thus, the less work the WESB will be doing, the better.

      mmalc wrote:
      If the ESB approach is to be used then I'd suggest a two stage process as you are right, ingesting a large file into the ESB and maintaining it in memory is not recommended. Without know the details and talking generically, I'd look to the first pass to extract the common pieces of data and storing them in a separate space from the line items. The second pass would then read in the common data and use it to process the line items. Both of these could potentially be performed in the ESB, it depends on the complexity of the file.

      So, you would extract the common data and split the file into individual messages without transforming them, if I read it correctly. Then you would transform the messages at the time they are processed individually.

      mmalc wrote:
      For example you mention XA transaction, I'm assuming this means you have multiple resource managers that you need to coordinate in the same transaction.

      The first attendant of the global transaction is the WMQ target. The other is the temporary database, as we need to mark the processed messages so that they don't get processed again.
      But, in fact, the global transaction is not necessary, if we make the processes and the backend system resistant to duplicates.

      mmalc wrote:
      Another thing to bear in mind is how much time do you have to process the incomming file. If the file is so large that the two stage process takes so long it interferes with possible real-time operations, or it another file arrives before you have finished processing the first, then you may need to consider other options.

      I've commented on this above.

      mmalc wrote:
      An interesting option would be to use large scale caching technology to support this process
      I have no experience with anything like this. Wouldn't it be a too large hammer for the tasks of my kind?
      • mmalc
        mmalc
        74 Posts
        ACCEPTED ANSWER

        Re: Architecture question

        ‏2012-12-11T08:51:07Z  in response to SystemAdmin
        Apologies for the slow response

        Regarding other tools I meant completely different products, for example WTX would probably handle this beter than WESB.

        If doing it in WESB, yes I would do a first pass through the file and store the common data preferebly in memory if that was possible and then on the second pass process the individual entries refering to the common data in memory or on a common file store etc. I'm not sure how feasible this is in your case. The important thing is to not try and ingest the complete file in WESB at once.

        Is it possible for you to perform this pre-processing on the file?

        Regarding the storing of memory in a cache, this may be overkill if this was all you were doing. If you do lots of this kind of processing then it may well be a useful option to consider. As an example in BPM v8 we have special mediation primitives that will read and write from WebSphere eXtreme Scale which would likely be an interesting solution to this problem.