Topic
15 replies Latest Post - ‏2014-12-08T14:42:19Z by Raj4All
Prashant Patel
Prashant Patel
14 Posts
ACCEPTED ANSWER

Pinned topic Sharing toolkit by reference not copy.

‏2013-02-15T07:31:23Z |
We have three toolkit which are used by difrrent process apps. The toolkit are of sizes 5 MB, 10 MB and 15 Mb. Each of the process app is above 15 MB in size. When the final process app is created for deployment the size of each process app is (toolkit + process) more than 20 MB or 30 MB in size. Since each toolkit is copied with the process app. This causes the JVM to be loaded with just 75 MB of deployment objects. Later when process instances are created for each app it will take further memory.

To reduce the memory consumption of the process app during deployment can the process app size be reduced by installing the toolkit as a shared toolkit? Which can be shared by refrenced by each process app during deployment and runtime time instead of sharing by copy.

We are using BPM 7.5.1.1. If sharing by reference is not possible in 7.5.1.1, is such a feature available in BPM 8.0.1.

Thanks
Prashant
Updated on 2013-04-04T21:36:40Z at 2013-04-04T21:36:40Z by NaveenBhardwaj08
  • SystemAdmin
    SystemAdmin
    7615 Posts
    ACCEPTED ANSWER

    Re: Sharing toolkit by reference not copy.

    ‏2013-02-15T14:42:47Z  in response to Prashant Patel
    IBM BPM basically handles toolkits by reference already already, and it always has. (Or, always, has as long as the Process Center has been around.) But I can understand why you meant have thought otherwise.

    When you create an offline deployment package, the process center creates a bundle of all of the things that are needed in order to run that process application. This obviously includes all dependencies, including toolkits. So, the Process Center packs the bundle with the snapshot of the process app, as well as all of the designated snapshots of the required toolkits (and recursively through their dependencies). This is why when your deployment bundles are created, they are relatively large: they do contain a "copy" of everything. Which seems reasonable to me. When that package is deployed, you want it to be able have everything it needs. (see my notes at the bottom.)

    But at deployment time, the Process Server is smart about the deployment. As it goes through the various snapshots in the bundle, it knows which toolkits and snapshots it already has. (And since snapshots are immutable, if it has a particular snapshot it knows that it is the exact same as the snapshot in the bundle.) So it will just skip over the ones that it already has installed and just create "references" to them. (I think you can even see this in the deployment logs, if I recall.)

    One other important point to mention, in response to your concerns about memory, is that process applications (and toolkits) are not loaded monolithically. Even if you do have a giant 75MB process application, it isn't all loaded in memory at any one time. Essentially each component is loaded "as needed" into memory and then cached for future use. And, for all of the reasons above, this includes toolkit components that are shared across multiple process applications.

    All of this same principles apply, by the way, for doing imports into the process center. We will occasionally see people post to this board saying "I am trying to duplicate a toolkit, so I export it and then import it again, but that doesn't work." That doesn't work for the same reason. The Process Center sees that it already has that toolkit snapshot and just skips over it since it already has a copy. (You can avoid this problem by using the built in clone feature, which creates a new copy but changes the identifier so that the Process Center knows that it is meant to be a different copy.)

    In summary,
    • If you looked at the database (of either the process center or the process server), you would see that all toolkit snapshots are handled by reference. It's only when creating deployment packages that copies are made, and those copies are "de-duplicated" at deployment time.
    • None of this is really relevant to memory consumption anyway, because memory isn't loaded monolithically for process applications. No matter how big or small your process applications and toolkits, components are loaded as needed and only one copy of an individual component version is ever loaded into memory at once.

    David

    (*In theory, the Process Center knows what is already deployed on the Process Server, and so it could omit toolkit snapshots that are already deployed. But this would be a bad idea, at least in my opinion. Firstly, because the state of the Process Server might change before the package is deployed. Secondly, because these packages are sometime archived or re-used, and the way the Process Center currently does deployments makes sure that the deployment package will always be usable. And, finally, the current way increases the robustness of the deployment system by having the deployment packages be self-contained and immutable. I wouldn't ever want to have a deployment fail with "not all dependencies present". So I much prefer this way, where copies are made and then de-duped at deployment time.)
    • Prashant Patel
      Prashant Patel
      14 Posts
      ACCEPTED ANSWER

      Re: Sharing toolkit by reference not copy.

      ‏2013-02-18T06:17:55Z  in response to SystemAdmin
      Thanks David for the excellent detailed reply. We shall do a POC on the same and come out with results and Q's.

      -Prashant
      • Prashant Patel
        Prashant Patel
        14 Posts
        ACCEPTED ANSWER

        Re: Sharing toolkit by reference not copy.

        ‏2013-02-20T11:16:14Z  in response to Prashant Patel
        Where can I find more information on toolkits shared by reference on IBM sites. I could not find any on Infocenter, red books or in Kolban's book.

        Thanks
        -Prashant
        • SystemAdmin
          SystemAdmin
          7615 Posts
          ACCEPTED ANSWER

          Re: Sharing toolkit by reference not copy.

          ‏2013-02-20T15:16:15Z  in response to Prashant Patel
          I don't think you can. I can't speak for IBM, but I would guess that things like this are generally regarded as implementation details that are intentionally undocumented. (So that IBM could change that behavior if they felt it was necessary to implement a feature, improve performance, etc. For example, IID components are essentially deployed as copies: each into its own WAS application because of how they wanted to implement SCA.)

          Some of the details of the behavior you could find in the deployment sections of the docs, but the internal details of how things are stored in the Process Center is intentionally undocumented to discourage applications from building dependencies on those data structures. I doubt that this behavior of linking toolkits by reference is likely to change soon, but deployment and versioning did change radically from 6.x to 7.x so there are no guarantees that things might change again someday.

          David
          • Prashant Patel
            Prashant Patel
            14 Posts
            ACCEPTED ANSWER

            Re: Sharing toolkit by reference not copy.

            ‏2013-02-21T06:13:05Z  in response to SystemAdmin
            Based on this information we will decide how to breakup a process app and how many toolkits to build. This also mean a lots of re-work for us. So, as a client we shall approach IBM with our problem, and then lets see what they reply. Thanks again David for your comments.

            -Prashant
            • SystemAdmin
              SystemAdmin
              7615 Posts
              ACCEPTED ANSWER

              Re: Sharing toolkit by reference not copy.

              ‏2013-02-21T20:29:12Z  in response to Prashant Patel
              The last comment is a bit confusing to me. In my experience the right way to determine how to break things up into Process Apps is by thinking about how you want to handle future changes and promotions. You could put everything in a single PA, but then when you wanted to promote a revision to an existing BPD you would have to make sure all the other assets from all the other BPDs were in state that they could be promoted. That seems wrong. Likewise one PA per single BPD would be very confusing for things that span multiple BPDs. So the right level of granualrity on the PA side is "What things logically should all be ready for a promotion at the same time."

              For TK, the right granualrity is, if an item needs to be utilized by more than one PA, it should be in a TK. If there are multiple items that are to be shared between PA, and they have similar functionalities (say, integrations to another system) they should be in the same TK. I prefer my TKs to be rather small in scope ("People Soft integrations TK", "LDAP integration TK", etc.) and labeled so that the name of the TK tells anyone using it what they should expect to see in the TK. Some customers create a single "Common" TK and dump all the shared items in there. I don't like that as much as it makes determining the impact of the change within a TK a bit harder to ascertain.

              As mentioned on another thread, if you have things in a PA and decide they should be in a different PA or different TK, IBM BPM handles that use case very well. Just right click the item and select "Move To…" and move it. If you move it to a TK, then the references in the PA should all update. If you move it to a PA, then there shouldn't be anything left behind in the original PA that needs to be updated. When you use this "Move To" functionality in IBM BPM you are actually told if there are other things that should move becasue they are related to the thing you are moving.

              Moving items out of a TK and into a PA does not work nearly as cleanly I believe, but I haven't had to do that yet. That use case actually confuses me since there is no harm in just leaving the item in the TK as far as I can tell..

              Andrew Paier | Director of Special Operations | BP3 Global, Inc. www.bp-3.com
              • SystemAdmin
                SystemAdmin
                7615 Posts
                ACCEPTED ANSWER

                Re: Sharing toolkit by reference not copy.

                ‏2013-02-21T22:00:23Z  in response to SystemAdmin
                I agree with Andrew. I've been thinking about your response this morning and I wasn't really sure how to react to it. I feel like there must have been some misunderstanding; perhaps I didn't understand your concerns clearly enough.

                The point of my posts was "Don't worry about the size of your deployment packages: trust that IBM BPM is doing the right thing behind the scenes." I gave you lots of detail just to make you comfortable with the whole deployment process, but the point was that IBM BPM handles this in an efficient way, and you shouldn't feel the need to structure your toolkits based on how they are deployed. In fact, I tried to highlight that it currently works pretty much exactly as you had desired it to.

                So, I'm not sure what about my response seems to have encouraged you to spend a lot of rework reorganizing your toolkits. Andrew gave a great summary of the current best practices regarding toolkit organization, but one thing I definitely wouldn't worry about is the resultant export file size or memory usage.

                David
                • SystemAdmin
                  SystemAdmin
                  7615 Posts
                  ACCEPTED ANSWER

                  Re: Sharing toolkit by reference not copy.

                  ‏2013-02-21T23:09:02Z  in response to SystemAdmin
                  Right. Memory usage is going to be significantly more strongly driven by what you are doing in your process than how IBM chooses to do the deployments. At least as far a PA and TK are concerned. I'm not very experienced with the Advanced features, so maybe that isn't true for those, I just don't know.

                  If anyone has evidence that changing how you organize a solution impacts either memory or performance, I'm sure the forum members would love to be able to see that data.

                  Andrew Paier | Director of Special Operations | BP3 Global, Inc. www.bp-3.com
                  • Prashant Patel
                    Prashant Patel
                    14 Posts
                    ACCEPTED ANSWER

                    Re: Sharing toolkit by reference not copy.

                    ‏2013-02-27T07:59:11Z  in response to SystemAdmin
                    Apologies for delayed response.

                    David - Your explanation has been excellent and sound. But, we had approached IBM prior to this post, with similar Q's and they suggested, if your single process app is broken down into two and tool-kits are shared among them, and the resultant sizes of both apps(which contains common tool-kits) is large then it will have a negative effect on the JVM.

                    PA1 including tool-kits(5.5MB) = 15 MB, later broken(by business functionality) into PA2 with tool-kits(5.5) = 9.5 MB. Hence we have two PA with 12MB and 9.5MB. If both are deployed, it will have a negative impact on the JVM. Hence we are confused on which way to go, and looking for documentation on infocenter.

                    We also started the POC to find out what effort and how much time will it take to decompose the single PA. We created PA 2 and a toolkit containing common process (TK-Common processes). We already have existing common tool-kits except TK-Common processes which is create for the POC.

                    Andrew - We have created tool-kits primarily on bases of reuse across apps. e.g. exception handling, integration services, common process(and its dependent artifacts). And process based on business functionality. You input on PA granularity “What things logically should all be ready for a promotion at the same time.” is thought full we shall use that suggestion. While dividing the modules and moving items to toolkit, we found that the mapping we broken to existing items in single process app, may be did not move items from least dependent to most, need to check that again.

                    Thanks again for your posts. Appreciate it.
                    • SystemAdmin
                      SystemAdmin
                      7615 Posts
                      ACCEPTED ANSWER

                      Re: Sharing toolkit by reference not copy.

                      ‏2013-02-27T15:53:08Z  in response to Prashant Patel
                      I really sympathize with that. For good and for ill, IBM is a really big company. And that means that sometimes information doesn't disseminate perfectly across IBM. And not every one of IBM's 400,000 products can be an expert on every one of the thousands of IBM products. The unfortunate result is that sometimes IBM gets things wrong, especially at the local level. I spent a lot of time while I was at IBM trying to combat this, so I'm very aware of it.

                      When I was in grade school we played a game called "telephone". The class would line up in a line or a circle and the teacher would whisper a word or phrase to the first kid in line.T the first kid would whisper it to the second kid, the second would whisper it to third, and so on, until the message got to the last kid. The last kid would then tell the class what message he heard. It was usually something completely different than the original message. A silly game, but the point is the more times a message gets told the more chance it has of being misinterpreted.

                      Similarly, I suspect that this misplaced concern IBM warned you about may have originated from a real problem. Large export files can indeed consume a lot of JVM memory when they are being imported/deployed. As I mentioned, when they are being imported/deployed the runtime server goes through all of the components and figures out what snapshots it already has. (And even within those snapshots, what components it already has. A similar process happened in 3.x-6.x, although it was slightly different because snapshots/PAs/TKs didn't exist back then.) And that import process, as well as the translation from XML into the DB, consumed a lot of memory. It wasn't unusual, especially on older versions, to run out of heap memory during the import. Especially since in those days we were often working with 32 bit heaps. That efficiency has been improved a lot of years and I don't hear it being a problem anymore.

                      But that might have been the source for the incorrect information you heard. Someone somewhere had a problem with loading a large export, someone responded with the true fact that importing can consume a lot of memory, which was then later repeated as "large export files consume a lot of memory", and someone later misinterpreted that as "large export files consume a lot of memory at run time". (At another company I worked at we called this "folklore". We would have consultants that would do things a certain way, or avoid a certain feature, just because that is what they had originally been taught. Even if the reason for doing that, or the bug that required avoiding that feature had long been fixed.)

                      So, what can you do? After all, I'm asking you to believe me and Andrew, some random guys on the internet forum, instead of an IBM employee. Especially since there's always a chance that I have incomplete information: I only have a very short post from you describing the problem. Although I don't think so, maybe I'm missing some important detail about your application, configuration, or problem.

                      So:

                      • Read the performance red paper. http://www.redbooks.ibm.com/redpapers/pdfs/redp4784.pdf . You will notice that no where in there does it talk about about how to structure your PAs/TKs. Despite there being quite a lot of discussion in that paper about how to optimize memory and memory usage. (You will also notice my name on that paper. I didn't write a huge portion of it, but at should at least give you some confidence that I am regarded as an expert of performance and tuning for IBM BPM. I bet you can find Andrew's name on a lot of documents as well, he has been around the product even longer than me.)

                      • Look at the database. It's pretty easy to verify that each component of a toolkit only exists once in the database regardless of how many times that toolkit is referenced. I always discourage people from having their code interact directly with the PROC database, but just looking at the structure can help you understand the system.

                      • Peeking into the configuration files (99Local.xml et al) will also expose you to settings like cached-objects-ttl which should lead you to believe that the memory cache is per object, and not per process app/toolkit. None of this is perfectly conclusive of course, but it should give you a general degree of confidence. You, of course, could do a quick performance test as well, but even quick performance tests tend to take a lot of time.

                      • Push your IBM contact for clarification. Say that you've found conflicting information on the internet and you want them to verify what they told you. Hopefully they'll escalate up to product management or someone other mailing list or expert that will get them straightened out.

                      Best of luck,

                      David
                      • Prashant Patel
                        Prashant Patel
                        14 Posts
                        ACCEPTED ANSWER

                        Re: Sharing toolkit by reference not copy.

                        ‏2013-02-28T07:10:57Z  in response to SystemAdmin
                        After you first post itself I checked that Red Book and found your name one it, and knew advice is coming from a credible source. Thanks for your valuable advice again, we shall work on the matter with IBM and update the community with results.

                        -Prashant
                        • Prashant Patel
                          Prashant Patel
                          14 Posts
                          ACCEPTED ANSWER

                          Re: Sharing toolkit by reference not copy.

                          ‏2013-04-04T12:37:39Z  in response to Prashant Patel
                          We have consulted IBM support and they confirmed that a large process app does not mean more memory consumption because memory isn't loaded monolithically for process applications. And process server is smart at deployment, and all toolkits are handled by reference. Toolkits and components are loaded as needed, and only one version of is ever loaded in memory once.

                          Thanks IBM for thier support. Thanks David and Andrew.
                          • SystemAdmin
                            SystemAdmin
                            7615 Posts
                            ACCEPTED ANSWER

                            Re: Sharing toolkit by reference not copy.

                            ‏2013-04-04T13:58:26Z  in response to Prashant Patel
                            Prashant,

                            Can you confirm one more thing also or if you have already discussed, please let me know.

                            What is the case when we have many process applications with different versions of the common toolkit i.e If have a process app PA1 having dependency with toolkit version 1 and PA2 having dependency with toolkit version 2. I am sure both the snapshots will recide in process server database. When later I upload PA1 version 1 with toolkit dependency upgraded to version 2, does the toolkit version 1 still stay in database even though nobody is referring it.

                            I am facing following issue when deploying second process app with different version of the same toolkit
                            26/03/13 09:45:16:719 EDT 0000017c IFDSRepoAdapt E A contribution with the name: XXX_Toolkit_Library was not found.
                            com.lombardisoftware.core.TeamWorksException: A contribution with the name: XXX_Toolkit_Library was not found.

                            Manish
                            • SystemAdmin
                              SystemAdmin
                              7615 Posts
                              ACCEPTED ANSWER

                              Re: Sharing toolkit by reference not copy.

                              ‏2013-04-04T21:21:26Z  in response to SystemAdmin
                              It does stay in the DB. Mainly because your statement "since no one is referring to it" is not corrrect. Each snapshot is its own entity. When you import a new snapshot, it does not overwrite the previous snapshot, it is in the DB along side of the previous snapshot. While you have uploaded a new snapshot of PA1, the previous snapshot of PA1 is still present and is still referencing the old snapshot.

                              Is your scenario one that uses Advanced items? The above is talking only about toolkit items used in standard. I'm not quite sure how advance behaves with respect to versioning...

                              Andrew Paier | Director of Special Operations | BP3 Global, Inc. www.bp-3.com
                            • NaveenBhardwaj08
                              NaveenBhardwaj08
                              55 Posts
                              ACCEPTED ANSWER

                              Re: Sharing toolkit by reference not copy.

                              ‏2013-04-04T21:36:40Z  in response to SystemAdmin
                              Manish..ur error should be with AIS. i have seen this error once..

                              Might be a case that you are referring an library in AIS implementation(IID) which is not available/deleted but IID keeps referencing it..

                              open your AIS in IID and check in dependencies option..see if you see anything.
                              • This reply was deleted by Raj4All 2014-12-08T14:43:15Z.