Topic
  • 11 replies
  • Latest Post - ‏2015-04-30T18:06:06Z by dogren@gmail.com
AndraSeenu
AndraSeenu
20 Posts

Pinned topic Business Data Persistence, out of the box?

‏2014-01-30T23:17:37Z |

Hello,

Question to BPM experts here, I am new to 8.5 and curious about how most of the people are persisting business data in version 8.5.  I know, all process related data goes to process server db and some into blobs (not readable) .  But this may not be ideal solution in many cases, as users need business data in various reports. I think, i am mostly curious about.

1. How does BPM 8.5 stores data.

2. Is there out of the box persistence support to store business data (may be in seperate DB/Schema).

3. Do I need custom persistence solution like with Hibernate, other ORM?

3. Would I be able to query this data using SQL Developer/ Navigator etc, or only thru APIs?

 

Thank you for help

Regards

Srini

 

  • dogren@gmail.com
    dogren@gmail.com
    421 Posts
    ACCEPTED ANSWER

    Re: Business Data Persistence, out of the box?

    ‏2014-01-31T07:40:50Z  

    When it comes to situations like this, it is almost always better to talk in concrete use cases. "Business data" is just so vague and the right way to store data is always a nuanced decision. (BPM or not; it seems like there are major data architecture debates in every technology I've ever used. Data architecture always has trade offs.) So I'd  love to hear some more details on where you are coming from.

    But, first, let me start with answering your direct questions.

    1. How does BPM 8.5 stores [sic] data.

    It depends on what you mean by "data". Process data is, as you point out, stored in database BLOBs. Reporting data (which can be anything because of the flexibility of tracking groups), is stored in flat database tables for easy querying. Search data can be stored in either key-value tables (what I've sometimes heard described as long skinny tables) or in flat database tables, depending on how the system is configured. (The default is key-value tables.)

    It's essentially a "best tool for the job" approach. You seem to not like the BLOB approach, but let me come back to that when I get on my soapbox and keep to the bare facts for a bit longer.

    2. Is there out of the box persistence support to store business data (may be in separate [sic] DB/Schema).

    Again, it depends on what you mean. Obviously, there are the approaches above. IBM BPM has the most automatic and flexible persistence system I've seen in a BPM product. (Automatic persistence via BLOBs, automatic persistence of reporting/searching data, including cross-task and cross-process data.) But, if you want to explicitly map to relational tables in a separate schema, there are also the out of the box database connectors. They aren't a complete persistence layer (essentially they can map flat objects in, but that's about it), but they get the job done if you have to move data in or out.

    There used to be a "persistence framework" that probably did something along the lines of what you are looking for. (It wast part of the product, just something that professional services had developed.) I don't think it was ever ported to 7.x/8.x. In part because of some of the downsides of that kind of persistence, and partly because shared data objects obviated its main use case. It wouldn't be impossible to build a generic framework, but for the reasons below, I recommend against it.

    3. Do I need custom persistence solution like with Hibernate, other ORM?

    If you listen to one thing I say in this post, it is this. Do not go down this road. I've seen some people try it, against my advice, and it's never been pretty. Even leaving aside some of my academic objections below,  you fundamentally have two major disconnects at the start, both fatal on their own. Firstly, Java based persistence solutions are going to assume that you are mapping a Java object to a database. You do not have a Java object to start with. It can be a little tempting to pretend that you do, because you can either try to muck with the TWObject internals or the Livescript Java wrappers to the Javascript interfaces to tw.local objects . But they aren't real Java objects and, in the end, you will end up having to building a way to convert IBM BPM data objects into Java POJOs. And building that conversion is going to be harder than building a persistence layer for BPM data objects natively.

    The second fatal flaw is that persistence solutions generally need to assume that they own the lifecycle of the objects. (For many reasons.) And there's no way that you can make that happen in any kind of performant way.

    3. [sic] Would I be able to query this data using SQL Developer/ Navigator etc, or only thru APIs?

    Again, it depends on what you are doing. If you use tracking points/autotracking, you can query with SQL. If you are using process data, I wouldn't do either (APIs or SQL) as process data is intended for the process. If you have data that belongs in a SOR, you should keep it in a SOR, and not the process. And if you have an SOR, you probably have ways of getting data in and out anyway. If need be, you can persist back and forth with the SQL connector, but the upside of a SOR is that you also probably have some genuine APIs.

    So, with the direct questions out of the way, let me get on my soapbox a bit.

    It seems that you are starting with the presumption that storing process data in BLOBs is a bad idea. You say: "not be ideal solution in many cases". I've worked with eight BPM products professionally. (Obviously some products I've used more than others.) Six of those products have stored process data as one sort of BLOB or another. The other two have essentially tried to have their own sort of proprietary ORM tool. And those two, with their proprietary ORM, have always ended up being data disasters. The proprietary ORM tool is never good enough and you ended up having to dig into the SQL by hand. And performance is almost always a disaster. Not to mention that it makes handling change in the process data a complete nightmare.

    Which sort of is the story of ORM in general. Go to Google and type in "Why is Hibernate so" and Google will have fill in the rest for you: "why is Hibernate so slow". I think that Joel Spolsky had it right in this famous blog post Object-Relational Mapping is the Vietnam of Computer Science. (Which in turn, was based on a a blog post by Ted Neward of the same name. And I say this as someone who is currently spending most of their day, every day, using an ORM. The point is that ORM is a hard problem and one that is very hard to solve generically. Or solve in a high performance way. Or in a way that is easy to use without code. It's a tough problem to solve, and I remain unconvinced that there are any real benefits for accepting all of the downsides of ORM in a BPM system. IBM BPM has  addressed the use cases that can't be solved with BLOBs (e.g. reporting/searching/cross-BPD data). It kind of sucks that there is no one size fits all solution (and therefore I'm doomed to be having this discussion about SOR data forever), but my experiences with ORMs. and especially ORM within the context of BPM, doesn't lead me to believe that ORM is a one size fits all solution either.

    Every time I've run into someone who thinks that ORM is a key feature of implementing a BPM system, they have either been selling a BPM with a proprietary homegrown ORM system, they have been trying to build an application (rather than a process), or they have been stuck in an application development mentality. Clearly at least two product designers disagree with me on this. I admit that, as strongly as I feel that a BLOB based solution is the only sane product design, that there is some room for debate at the product level.

    But the only place for that debate is at the product level. If you are going to use IBM BPM, you need to get onboard with its persistence mechanisms. Trying to adopt a platform, but deciding that you disagree with one of its core design decisions just seems like masochism.

    I've gone off on a bit of a rant here, so sorry for that. But I suspect that either:

    1) You have a business problem that requires externalizing some portion of your data, most likely because it belongs in a SOR. I'd love to hear the details of your use case in a follow up post.. Think about where that data belongs. If they can access the data outside of the process, is the data really part of the process? And if the data isn't part of the process where does it belong? Often I find that these types of questions are just a sign that there is a missing SOR. Building a quick and dirty SOR suddenly solves all kinds of problems in those situations.

    2) You are just uncomfortable with BLOBs. I see this from time to time, often with someone that has been burned by some proprietary system where they couldn't get at their data. All I can say is that the system is pretty battle hardened. It handles a lot of use cases (such as version migration, multi-instance loops, nested tasks and processes, cross process reporting and searching) very elegantly that tend to turn relationally based "persistence systems" into piles of dung.

    In answer to your original question "how are people persisting their data?" the answer is "with the most appropriate tool". Typically just by sticking the data into tw.local and letting the platform persist it automatically. But the "searchable" flag, autotracking, tracking groups, and all of the integration connectors (PD and ID) give you a lot of tools in your toolbox. But the one tool I would never, ever, recommend it trying to integrate a generic persistence framework.

    David

    Updated on 2014-01-31T22:44:42Z at 2014-01-31T22:44:42Z by dogren@gmail.com
  • kolban
    kolban
    3322 Posts
    ACCEPTED ANSWER

    Re: Business Data Persistence, out of the box?

    ‏2014-01-31T15:48:38Z  

    Thanks you for very detailed and informative answers.  This is very useful information. I understand now, but to be sure here is the use case.  

    We have few coaches in our application.  Each coach contains input fields where users input financial data and some contact information like name, address, phone numbers etc.

    We will have to make this data available to business intelligence team, so they can create reports based on data that is user input.

    In this case, would it be okay to use performance server DB for this, by making them searchable and trackable, as suggested?.

    In case, we go with external SOR.  Is it okay to use tw.local varaible with available sql connectors to persist data to this external SOR?

     

    Thanks again for help.

    Srini

    Hi Srini,

    If the data entered in the Coach is to be ONLY used for report generation or for analysis on how to improve the process in the future, then the Performance Data Warehouse (PDW) sounds like a perfect destination for the data.  You need not worry about writing it yourself and let BPM do the work on your behalf.  However, if the data being collected is to be "kept" and potentially referenced by other applications in the future, then it becomes what I would consider system of record data and should be stored in your own SOR store (eg. an application database).

    In your story (so far), I could see it going either way.

    If you want to save in your own DB, then YES .... the BPM supplied SQL connectors are just what the doctor ordered.

    Neil

  • kolban
    kolban
    3322 Posts

    Re: Business Data Persistence, out of the box?

    ‏2014-01-30T23:32:23Z  

    Hi Srini, welcome to the forum.

    The general philosophy is that a BPM system is not your "system of record" for your data.  Rather, BPM choreographs or orchestrates the steps in the process that may interact with those systems of record.  As a process executes, it may generate and consume data that is needed for the process operation but isn't considered necessary after the process completes other than for reporting purposes.

    When BPM operates, it has the ability to record all the steps executed and values of selected variables into tables in a database that BPM calls the "Performance Data Warehouse" (PDW).   The tables in this DB are considered "exposed" by IBM and you can write your own SQL to query their content.

    For an individual BPM process, you can assume that BPM will manage the existence of the process variables during its lifetime.  Even if the process lasts for months, BPM is maintaining the state of the process.  How it does this should be considered a black box to you.  When a process comes to a conclusion, you should consider any state maintained by the process to be able to be deleted (i.e. you should not rely on it being present any more).  While the process exists, you can use BPM provided APIs to retrieve the data for that process.  See the concept of "Searchable" data.

    I suspect this is only a partial answer to your question and invite you to post back anytime with a more detailed story or use case (real or imagined) so we can delve further.

    Neil

  • dogren@gmail.com
    dogren@gmail.com
    421 Posts

    Re: Business Data Persistence, out of the box?

    ‏2014-01-31T07:40:50Z  

    When it comes to situations like this, it is almost always better to talk in concrete use cases. "Business data" is just so vague and the right way to store data is always a nuanced decision. (BPM or not; it seems like there are major data architecture debates in every technology I've ever used. Data architecture always has trade offs.) So I'd  love to hear some more details on where you are coming from.

    But, first, let me start with answering your direct questions.

    1. How does BPM 8.5 stores [sic] data.

    It depends on what you mean by "data". Process data is, as you point out, stored in database BLOBs. Reporting data (which can be anything because of the flexibility of tracking groups), is stored in flat database tables for easy querying. Search data can be stored in either key-value tables (what I've sometimes heard described as long skinny tables) or in flat database tables, depending on how the system is configured. (The default is key-value tables.)

    It's essentially a "best tool for the job" approach. You seem to not like the BLOB approach, but let me come back to that when I get on my soapbox and keep to the bare facts for a bit longer.

    2. Is there out of the box persistence support to store business data (may be in separate [sic] DB/Schema).

    Again, it depends on what you mean. Obviously, there are the approaches above. IBM BPM has the most automatic and flexible persistence system I've seen in a BPM product. (Automatic persistence via BLOBs, automatic persistence of reporting/searching data, including cross-task and cross-process data.) But, if you want to explicitly map to relational tables in a separate schema, there are also the out of the box database connectors. They aren't a complete persistence layer (essentially they can map flat objects in, but that's about it), but they get the job done if you have to move data in or out.

    There used to be a "persistence framework" that probably did something along the lines of what you are looking for. (It wast part of the product, just something that professional services had developed.) I don't think it was ever ported to 7.x/8.x. In part because of some of the downsides of that kind of persistence, and partly because shared data objects obviated its main use case. It wouldn't be impossible to build a generic framework, but for the reasons below, I recommend against it.

    3. Do I need custom persistence solution like with Hibernate, other ORM?

    If you listen to one thing I say in this post, it is this. Do not go down this road. I've seen some people try it, against my advice, and it's never been pretty. Even leaving aside some of my academic objections below,  you fundamentally have two major disconnects at the start, both fatal on their own. Firstly, Java based persistence solutions are going to assume that you are mapping a Java object to a database. You do not have a Java object to start with. It can be a little tempting to pretend that you do, because you can either try to muck with the TWObject internals or the Livescript Java wrappers to the Javascript interfaces to tw.local objects . But they aren't real Java objects and, in the end, you will end up having to building a way to convert IBM BPM data objects into Java POJOs. And building that conversion is going to be harder than building a persistence layer for BPM data objects natively.

    The second fatal flaw is that persistence solutions generally need to assume that they own the lifecycle of the objects. (For many reasons.) And there's no way that you can make that happen in any kind of performant way.

    3. [sic] Would I be able to query this data using SQL Developer/ Navigator etc, or only thru APIs?

    Again, it depends on what you are doing. If you use tracking points/autotracking, you can query with SQL. If you are using process data, I wouldn't do either (APIs or SQL) as process data is intended for the process. If you have data that belongs in a SOR, you should keep it in a SOR, and not the process. And if you have an SOR, you probably have ways of getting data in and out anyway. If need be, you can persist back and forth with the SQL connector, but the upside of a SOR is that you also probably have some genuine APIs.

    So, with the direct questions out of the way, let me get on my soapbox a bit.

    It seems that you are starting with the presumption that storing process data in BLOBs is a bad idea. You say: "not be ideal solution in many cases". I've worked with eight BPM products professionally. (Obviously some products I've used more than others.) Six of those products have stored process data as one sort of BLOB or another. The other two have essentially tried to have their own sort of proprietary ORM tool. And those two, with their proprietary ORM, have always ended up being data disasters. The proprietary ORM tool is never good enough and you ended up having to dig into the SQL by hand. And performance is almost always a disaster. Not to mention that it makes handling change in the process data a complete nightmare.

    Which sort of is the story of ORM in general. Go to Google and type in "Why is Hibernate so" and Google will have fill in the rest for you: "why is Hibernate so slow". I think that Joel Spolsky had it right in this famous blog post Object-Relational Mapping is the Vietnam of Computer Science. (Which in turn, was based on a a blog post by Ted Neward of the same name. And I say this as someone who is currently spending most of their day, every day, using an ORM. The point is that ORM is a hard problem and one that is very hard to solve generically. Or solve in a high performance way. Or in a way that is easy to use without code. It's a tough problem to solve, and I remain unconvinced that there are any real benefits for accepting all of the downsides of ORM in a BPM system. IBM BPM has  addressed the use cases that can't be solved with BLOBs (e.g. reporting/searching/cross-BPD data). It kind of sucks that there is no one size fits all solution (and therefore I'm doomed to be having this discussion about SOR data forever), but my experiences with ORMs. and especially ORM within the context of BPM, doesn't lead me to believe that ORM is a one size fits all solution either.

    Every time I've run into someone who thinks that ORM is a key feature of implementing a BPM system, they have either been selling a BPM with a proprietary homegrown ORM system, they have been trying to build an application (rather than a process), or they have been stuck in an application development mentality. Clearly at least two product designers disagree with me on this. I admit that, as strongly as I feel that a BLOB based solution is the only sane product design, that there is some room for debate at the product level.

    But the only place for that debate is at the product level. If you are going to use IBM BPM, you need to get onboard with its persistence mechanisms. Trying to adopt a platform, but deciding that you disagree with one of its core design decisions just seems like masochism.

    I've gone off on a bit of a rant here, so sorry for that. But I suspect that either:

    1) You have a business problem that requires externalizing some portion of your data, most likely because it belongs in a SOR. I'd love to hear the details of your use case in a follow up post.. Think about where that data belongs. If they can access the data outside of the process, is the data really part of the process? And if the data isn't part of the process where does it belong? Often I find that these types of questions are just a sign that there is a missing SOR. Building a quick and dirty SOR suddenly solves all kinds of problems in those situations.

    2) You are just uncomfortable with BLOBs. I see this from time to time, often with someone that has been burned by some proprietary system where they couldn't get at their data. All I can say is that the system is pretty battle hardened. It handles a lot of use cases (such as version migration, multi-instance loops, nested tasks and processes, cross process reporting and searching) very elegantly that tend to turn relationally based "persistence systems" into piles of dung.

    In answer to your original question "how are people persisting their data?" the answer is "with the most appropriate tool". Typically just by sticking the data into tw.local and letting the platform persist it automatically. But the "searchable" flag, autotracking, tracking groups, and all of the integration connectors (PD and ID) give you a lot of tools in your toolbox. But the one tool I would never, ever, recommend it trying to integrate a generic persistence framework.

    David

    Updated on 2014-01-31T22:44:42Z at 2014-01-31T22:44:42Z by dogren@gmail.com
  • AndraSeenu
    AndraSeenu
    20 Posts

    Re: Business Data Persistence, out of the box?

    ‏2014-01-31T15:35:15Z  

    Thanks you for very detailed and informative answers.  This is very useful information. I understand now, but to be sure here is the use case.  

    We have few coaches in our application.  Each coach contains input fields where users input financial data and some contact information like name, address, phone numbers etc.

    We will have to make this data available to business intelligence team, so they can create reports based on data that is user input.

    In this case, would it be okay to use performance server DB for this, by making them searchable and trackable, as suggested?.

    In case, we go with external SOR.  Is it okay to use tw.local varaible with available sql connectors to persist data to this external SOR?

     

    Thanks again for help.

    Srini

  • kolban
    kolban
    3322 Posts

    Re: Business Data Persistence, out of the box?

    ‏2014-01-31T15:48:38Z  

    Thanks you for very detailed and informative answers.  This is very useful information. I understand now, but to be sure here is the use case.  

    We have few coaches in our application.  Each coach contains input fields where users input financial data and some contact information like name, address, phone numbers etc.

    We will have to make this data available to business intelligence team, so they can create reports based on data that is user input.

    In this case, would it be okay to use performance server DB for this, by making them searchable and trackable, as suggested?.

    In case, we go with external SOR.  Is it okay to use tw.local varaible with available sql connectors to persist data to this external SOR?

     

    Thanks again for help.

    Srini

    Hi Srini,

    If the data entered in the Coach is to be ONLY used for report generation or for analysis on how to improve the process in the future, then the Performance Data Warehouse (PDW) sounds like a perfect destination for the data.  You need not worry about writing it yourself and let BPM do the work on your behalf.  However, if the data being collected is to be "kept" and potentially referenced by other applications in the future, then it becomes what I would consider system of record data and should be stored in your own SOR store (eg. an application database).

    In your story (so far), I could see it going either way.

    If you want to save in your own DB, then YES .... the BPM supplied SQL connectors are just what the doctor ordered.

    Neil

  • dogren@gmail.com
    dogren@gmail.com
    421 Posts

    Re: Business Data Persistence, out of the box?

    ‏2014-01-31T22:43:28Z  
    • kolban
    • ‏2014-01-31T15:48:38Z

    Hi Srini,

    If the data entered in the Coach is to be ONLY used for report generation or for analysis on how to improve the process in the future, then the Performance Data Warehouse (PDW) sounds like a perfect destination for the data.  You need not worry about writing it yourself and let BPM do the work on your behalf.  However, if the data being collected is to be "kept" and potentially referenced by other applications in the future, then it becomes what I would consider system of record data and should be stored in your own SOR store (eg. an application database).

    In your story (so far), I could see it going either way.

    If you want to save in your own DB, then YES .... the BPM supplied SQL connectors are just what the doctor ordered.

    Neil

    For what it's worth, I pretty much agree completely with Neil.

    If you just want to pass it into BI: then use the PDW. That's exactly what it is designed for. So if it's only for report generation, just either use tracking groups or auto tracking to pass the information into your PDW tracking tables.

    On the other hand, it seems very odd to me that customer financial and contact information is only for reporting and doesn't belong in some sort of CRM system of record.

    David

  • Rahul_Pisal
    Rahul_Pisal
    1 Post

    Re: Business Data Persistence, out of the box?

    ‏2014-04-04T04:45:39Z  

    When it comes to situations like this, it is almost always better to talk in concrete use cases. "Business data" is just so vague and the right way to store data is always a nuanced decision. (BPM or not; it seems like there are major data architecture debates in every technology I've ever used. Data architecture always has trade offs.) So I'd  love to hear some more details on where you are coming from.

    But, first, let me start with answering your direct questions.

    1. How does BPM 8.5 stores [sic] data.

    It depends on what you mean by "data". Process data is, as you point out, stored in database BLOBs. Reporting data (which can be anything because of the flexibility of tracking groups), is stored in flat database tables for easy querying. Search data can be stored in either key-value tables (what I've sometimes heard described as long skinny tables) or in flat database tables, depending on how the system is configured. (The default is key-value tables.)

    It's essentially a "best tool for the job" approach. You seem to not like the BLOB approach, but let me come back to that when I get on my soapbox and keep to the bare facts for a bit longer.

    2. Is there out of the box persistence support to store business data (may be in separate [sic] DB/Schema).

    Again, it depends on what you mean. Obviously, there are the approaches above. IBM BPM has the most automatic and flexible persistence system I've seen in a BPM product. (Automatic persistence via BLOBs, automatic persistence of reporting/searching data, including cross-task and cross-process data.) But, if you want to explicitly map to relational tables in a separate schema, there are also the out of the box database connectors. They aren't a complete persistence layer (essentially they can map flat objects in, but that's about it), but they get the job done if you have to move data in or out.

    There used to be a "persistence framework" that probably did something along the lines of what you are looking for. (It wast part of the product, just something that professional services had developed.) I don't think it was ever ported to 7.x/8.x. In part because of some of the downsides of that kind of persistence, and partly because shared data objects obviated its main use case. It wouldn't be impossible to build a generic framework, but for the reasons below, I recommend against it.

    3. Do I need custom persistence solution like with Hibernate, other ORM?

    If you listen to one thing I say in this post, it is this. Do not go down this road. I've seen some people try it, against my advice, and it's never been pretty. Even leaving aside some of my academic objections below,  you fundamentally have two major disconnects at the start, both fatal on their own. Firstly, Java based persistence solutions are going to assume that you are mapping a Java object to a database. You do not have a Java object to start with. It can be a little tempting to pretend that you do, because you can either try to muck with the TWObject internals or the Livescript Java wrappers to the Javascript interfaces to tw.local objects . But they aren't real Java objects and, in the end, you will end up having to building a way to convert IBM BPM data objects into Java POJOs. And building that conversion is going to be harder than building a persistence layer for BPM data objects natively.

    The second fatal flaw is that persistence solutions generally need to assume that they own the lifecycle of the objects. (For many reasons.) And there's no way that you can make that happen in any kind of performant way.

    3. [sic] Would I be able to query this data using SQL Developer/ Navigator etc, or only thru APIs?

    Again, it depends on what you are doing. If you use tracking points/autotracking, you can query with SQL. If you are using process data, I wouldn't do either (APIs or SQL) as process data is intended for the process. If you have data that belongs in a SOR, you should keep it in a SOR, and not the process. And if you have an SOR, you probably have ways of getting data in and out anyway. If need be, you can persist back and forth with the SQL connector, but the upside of a SOR is that you also probably have some genuine APIs.

    So, with the direct questions out of the way, let me get on my soapbox a bit.

    It seems that you are starting with the presumption that storing process data in BLOBs is a bad idea. You say: "not be ideal solution in many cases". I've worked with eight BPM products professionally. (Obviously some products I've used more than others.) Six of those products have stored process data as one sort of BLOB or another. The other two have essentially tried to have their own sort of proprietary ORM tool. And those two, with their proprietary ORM, have always ended up being data disasters. The proprietary ORM tool is never good enough and you ended up having to dig into the SQL by hand. And performance is almost always a disaster. Not to mention that it makes handling change in the process data a complete nightmare.

    Which sort of is the story of ORM in general. Go to Google and type in "Why is Hibernate so" and Google will have fill in the rest for you: "why is Hibernate so slow". I think that Joel Spolsky had it right in this famous blog post Object-Relational Mapping is the Vietnam of Computer Science. (Which in turn, was based on a a blog post by Ted Neward of the same name. And I say this as someone who is currently spending most of their day, every day, using an ORM. The point is that ORM is a hard problem and one that is very hard to solve generically. Or solve in a high performance way. Or in a way that is easy to use without code. It's a tough problem to solve, and I remain unconvinced that there are any real benefits for accepting all of the downsides of ORM in a BPM system. IBM BPM has  addressed the use cases that can't be solved with BLOBs (e.g. reporting/searching/cross-BPD data). It kind of sucks that there is no one size fits all solution (and therefore I'm doomed to be having this discussion about SOR data forever), but my experiences with ORMs. and especially ORM within the context of BPM, doesn't lead me to believe that ORM is a one size fits all solution either.

    Every time I've run into someone who thinks that ORM is a key feature of implementing a BPM system, they have either been selling a BPM with a proprietary homegrown ORM system, they have been trying to build an application (rather than a process), or they have been stuck in an application development mentality. Clearly at least two product designers disagree with me on this. I admit that, as strongly as I feel that a BLOB based solution is the only sane product design, that there is some room for debate at the product level.

    But the only place for that debate is at the product level. If you are going to use IBM BPM, you need to get onboard with its persistence mechanisms. Trying to adopt a platform, but deciding that you disagree with one of its core design decisions just seems like masochism.

    I've gone off on a bit of a rant here, so sorry for that. But I suspect that either:

    1) You have a business problem that requires externalizing some portion of your data, most likely because it belongs in a SOR. I'd love to hear the details of your use case in a follow up post.. Think about where that data belongs. If they can access the data outside of the process, is the data really part of the process? And if the data isn't part of the process where does it belong? Often I find that these types of questions are just a sign that there is a missing SOR. Building a quick and dirty SOR suddenly solves all kinds of problems in those situations.

    2) You are just uncomfortable with BLOBs. I see this from time to time, often with someone that has been burned by some proprietary system where they couldn't get at their data. All I can say is that the system is pretty battle hardened. It handles a lot of use cases (such as version migration, multi-instance loops, nested tasks and processes, cross process reporting and searching) very elegantly that tend to turn relationally based "persistence systems" into piles of dung.

    In answer to your original question "how are people persisting their data?" the answer is "with the most appropriate tool". Typically just by sticking the data into tw.local and letting the platform persist it automatically. But the "searchable" flag, autotracking, tracking groups, and all of the integration connectors (PD and ID) give you a lot of tools in your toolbox. But the one tool I would never, ever, recommend it trying to integrate a generic persistence framework.

    David

    We have a similar kind of a problem here. We are building a business process application. There are 2 human activities and 2 system activities (mainly integrating with SAP) in the BPD. User inputs excel file that has all the financial transactions. This is then converted into an internal business object definition for further processing throughout the process. It is also used in the system activities to post the data to SAP system. The structure of the data is header + lines (these could be 999 for the biggest request). In all there are close to 30 attributes in this business object. We have made a conscious choice of relying on IBM BPM to persist the data between activities. Here are some of the business requirements.

    1. Users should be able to query process instances based on values of attributes from the business data. - This will be easier if IBM BPM manages the business data (persist throughout the process execution) and we marking those attributes "Available for searching" in process designer.

    Some of the non-functional requirements specify that there would be 10 such requests made every minute (not all will have 999 records in them).

    Now the concern is whether it is a right approach to persist data inside process DB or should it be stored in custom DB (for this I am sure we will have to write custom persistence logic). Would letting IBM BPM persist the business data between activities adversely impact its performance ? Is there a better approach to satisfy the NFR as well as the business need to search instances.

    There is also a mandatory requirement to retain the completed instance for 5 years (SOX compliance) for audit purposes. What is the best practice to handle these kind of scenarios ? Accumulating data worth 5 years in process DB/PDW will adversely impact the performance of the tool. If we decide to archive the completed instance in a custom schema how can we write the search queries to make sure they look into both process DB and archive DB to get the reports that business users need.

     

    Thanks,

    Rahul.

  • dogren@gmail.com
    dogren@gmail.com
    421 Posts

    Re: Business Data Persistence, out of the box?

    ‏2014-04-05T20:06:01Z  

    We have a similar kind of a problem here. We are building a business process application. There are 2 human activities and 2 system activities (mainly integrating with SAP) in the BPD. User inputs excel file that has all the financial transactions. This is then converted into an internal business object definition for further processing throughout the process. It is also used in the system activities to post the data to SAP system. The structure of the data is header + lines (these could be 999 for the biggest request). In all there are close to 30 attributes in this business object. We have made a conscious choice of relying on IBM BPM to persist the data between activities. Here are some of the business requirements.

    1. Users should be able to query process instances based on values of attributes from the business data. - This will be easier if IBM BPM manages the business data (persist throughout the process execution) and we marking those attributes "Available for searching" in process designer.

    Some of the non-functional requirements specify that there would be 10 such requests made every minute (not all will have 999 records in them).

    Now the concern is whether it is a right approach to persist data inside process DB or should it be stored in custom DB (for this I am sure we will have to write custom persistence logic). Would letting IBM BPM persist the business data between activities adversely impact its performance ? Is there a better approach to satisfy the NFR as well as the business need to search instances.

    There is also a mandatory requirement to retain the completed instance for 5 years (SOX compliance) for audit purposes. What is the best practice to handle these kind of scenarios ? Accumulating data worth 5 years in process DB/PDW will adversely impact the performance of the tool. If we decide to archive the completed instance in a custom schema how can we write the search queries to make sure they look into both process DB and archive DB to get the reports that business users need.

     

    Thanks,

    Rahul.

    I'm not sure I can reply succinctly. There are a lot of implied design questions in your question and I don't know if I can really answer in a forum post. But let me make a few replies and comments.

    What is the value of BPM to you?

    You might have a really important reason for using BPM, and just don't mention it because that's not your challenge, but I did want to throw the question out there. With only a couple of human activities, most of the "UI" being spreadsheet based, and with lots of system integration, it really sounds more like an app than a process application. Are you sure that IBM BPM is the best tool for the job.

    Querying data:

    A critical decision for you is whether instances represent requests or transactions. You say it will be easy to search for instances if you store the data in BPM, but since IBM BPM bases its searching on scalar instance data, you may find things hard to search unless you use an instance per transaction. (Which might be the right choice anyway, but this is the bigger question in my mind than performance.)

    Rate limiting your requests:

    I'm not totally sure what  the reason for this requirement is, but is it "overall" or "per request"? This is definitely solvable either way, but this will mean that your process might be more complicated than just four activities.

    Will persisting data in BPM be a performance problem?

    I stand by what I said in my earlier comments. Storing it in BPM is typically faster than storing it externally. (Just because you can conserve a query and serialization is very execution-opitmized.) But you will have so many other factors involved regarding whether tasks and/or instances are created for each request that it's hard to say definitively.

    Retaining "instance" data for SOX compliance.

    I've always handled these requirements via the PDW. Now the details will depend on what your exact requirements are for retaining information, but you almost certainly will want to let the PDW do the work for you. You might need to set up some tracking groups to get exactly what you need for compliance purposes, but that's a better approach than just letting the instances sit in the proc database for five years.

    David

  • abdielcs
    abdielcs
    4 Posts

    Re: Business Data Persistence, out of the box?

    ‏2015-04-29T18:33:11Z  

    When it comes to situations like this, it is almost always better to talk in concrete use cases. "Business data" is just so vague and the right way to store data is always a nuanced decision. (BPM or not; it seems like there are major data architecture debates in every technology I've ever used. Data architecture always has trade offs.) So I'd  love to hear some more details on where you are coming from.

    But, first, let me start with answering your direct questions.

    1. How does BPM 8.5 stores [sic] data.

    It depends on what you mean by "data". Process data is, as you point out, stored in database BLOBs. Reporting data (which can be anything because of the flexibility of tracking groups), is stored in flat database tables for easy querying. Search data can be stored in either key-value tables (what I've sometimes heard described as long skinny tables) or in flat database tables, depending on how the system is configured. (The default is key-value tables.)

    It's essentially a "best tool for the job" approach. You seem to not like the BLOB approach, but let me come back to that when I get on my soapbox and keep to the bare facts for a bit longer.

    2. Is there out of the box persistence support to store business data (may be in separate [sic] DB/Schema).

    Again, it depends on what you mean. Obviously, there are the approaches above. IBM BPM has the most automatic and flexible persistence system I've seen in a BPM product. (Automatic persistence via BLOBs, automatic persistence of reporting/searching data, including cross-task and cross-process data.) But, if you want to explicitly map to relational tables in a separate schema, there are also the out of the box database connectors. They aren't a complete persistence layer (essentially they can map flat objects in, but that's about it), but they get the job done if you have to move data in or out.

    There used to be a "persistence framework" that probably did something along the lines of what you are looking for. (It wast part of the product, just something that professional services had developed.) I don't think it was ever ported to 7.x/8.x. In part because of some of the downsides of that kind of persistence, and partly because shared data objects obviated its main use case. It wouldn't be impossible to build a generic framework, but for the reasons below, I recommend against it.

    3. Do I need custom persistence solution like with Hibernate, other ORM?

    If you listen to one thing I say in this post, it is this. Do not go down this road. I've seen some people try it, against my advice, and it's never been pretty. Even leaving aside some of my academic objections below,  you fundamentally have two major disconnects at the start, both fatal on their own. Firstly, Java based persistence solutions are going to assume that you are mapping a Java object to a database. You do not have a Java object to start with. It can be a little tempting to pretend that you do, because you can either try to muck with the TWObject internals or the Livescript Java wrappers to the Javascript interfaces to tw.local objects . But they aren't real Java objects and, in the end, you will end up having to building a way to convert IBM BPM data objects into Java POJOs. And building that conversion is going to be harder than building a persistence layer for BPM data objects natively.

    The second fatal flaw is that persistence solutions generally need to assume that they own the lifecycle of the objects. (For many reasons.) And there's no way that you can make that happen in any kind of performant way.

    3. [sic] Would I be able to query this data using SQL Developer/ Navigator etc, or only thru APIs?

    Again, it depends on what you are doing. If you use tracking points/autotracking, you can query with SQL. If you are using process data, I wouldn't do either (APIs or SQL) as process data is intended for the process. If you have data that belongs in a SOR, you should keep it in a SOR, and not the process. And if you have an SOR, you probably have ways of getting data in and out anyway. If need be, you can persist back and forth with the SQL connector, but the upside of a SOR is that you also probably have some genuine APIs.

    So, with the direct questions out of the way, let me get on my soapbox a bit.

    It seems that you are starting with the presumption that storing process data in BLOBs is a bad idea. You say: "not be ideal solution in many cases". I've worked with eight BPM products professionally. (Obviously some products I've used more than others.) Six of those products have stored process data as one sort of BLOB or another. The other two have essentially tried to have their own sort of proprietary ORM tool. And those two, with their proprietary ORM, have always ended up being data disasters. The proprietary ORM tool is never good enough and you ended up having to dig into the SQL by hand. And performance is almost always a disaster. Not to mention that it makes handling change in the process data a complete nightmare.

    Which sort of is the story of ORM in general. Go to Google and type in "Why is Hibernate so" and Google will have fill in the rest for you: "why is Hibernate so slow". I think that Joel Spolsky had it right in this famous blog post Object-Relational Mapping is the Vietnam of Computer Science. (Which in turn, was based on a a blog post by Ted Neward of the same name. And I say this as someone who is currently spending most of their day, every day, using an ORM. The point is that ORM is a hard problem and one that is very hard to solve generically. Or solve in a high performance way. Or in a way that is easy to use without code. It's a tough problem to solve, and I remain unconvinced that there are any real benefits for accepting all of the downsides of ORM in a BPM system. IBM BPM has  addressed the use cases that can't be solved with BLOBs (e.g. reporting/searching/cross-BPD data). It kind of sucks that there is no one size fits all solution (and therefore I'm doomed to be having this discussion about SOR data forever), but my experiences with ORMs. and especially ORM within the context of BPM, doesn't lead me to believe that ORM is a one size fits all solution either.

    Every time I've run into someone who thinks that ORM is a key feature of implementing a BPM system, they have either been selling a BPM with a proprietary homegrown ORM system, they have been trying to build an application (rather than a process), or they have been stuck in an application development mentality. Clearly at least two product designers disagree with me on this. I admit that, as strongly as I feel that a BLOB based solution is the only sane product design, that there is some room for debate at the product level.

    But the only place for that debate is at the product level. If you are going to use IBM BPM, you need to get onboard with its persistence mechanisms. Trying to adopt a platform, but deciding that you disagree with one of its core design decisions just seems like masochism.

    I've gone off on a bit of a rant here, so sorry for that. But I suspect that either:

    1) You have a business problem that requires externalizing some portion of your data, most likely because it belongs in a SOR. I'd love to hear the details of your use case in a follow up post.. Think about where that data belongs. If they can access the data outside of the process, is the data really part of the process? And if the data isn't part of the process where does it belong? Often I find that these types of questions are just a sign that there is a missing SOR. Building a quick and dirty SOR suddenly solves all kinds of problems in those situations.

    2) You are just uncomfortable with BLOBs. I see this from time to time, often with someone that has been burned by some proprietary system where they couldn't get at their data. All I can say is that the system is pretty battle hardened. It handles a lot of use cases (such as version migration, multi-instance loops, nested tasks and processes, cross process reporting and searching) very elegantly that tend to turn relationally based "persistence systems" into piles of dung.

    In answer to your original question "how are people persisting their data?" the answer is "with the most appropriate tool". Typically just by sticking the data into tw.local and letting the platform persist it automatically. But the "searchable" flag, autotracking, tracking groups, and all of the integration connectors (PD and ID) give you a lot of tools in your toolbox. But the one tool I would never, ever, recommend it trying to integrate a generic persistence framework.

    David

    I fully understand your reasons for not proposing the mapping between tw objects and Java objects and in that way use some ORM for persistence. Although I have not tried it, I presume that it might be more complicated than it seems. However, when using a relational database is required, either because the data is consumed or provided by other systems or for whatever reason, I do not see a better solution for achieving data persistence that using an external service call that can surely benefit from an ORM. If were the case when the weight of the data being shown and updated in the coach were in an external relational database, based on your years of experience, which would be the way to approach the persistence of this data, surely having tw objects that reflect that information ??

    Updated on 2015-04-29T18:34:28Z at 2015-04-29T18:34:28Z by abdielcs
  • dogren@gmail.com
    dogren@gmail.com
    421 Posts

    Re: Business Data Persistence, out of the box?

    ‏2015-04-29T21:16:32Z  
    • abdielcs
    • ‏2015-04-29T18:33:11Z

    I fully understand your reasons for not proposing the mapping between tw objects and Java objects and in that way use some ORM for persistence. Although I have not tried it, I presume that it might be more complicated than it seems. However, when using a relational database is required, either because the data is consumed or provided by other systems or for whatever reason, I do not see a better solution for achieving data persistence that using an external service call that can surely benefit from an ORM. If were the case when the weight of the data being shown and updated in the coach were in an external relational database, based on your years of experience, which would be the way to approach the persistence of this data, surely having tw objects that reflect that information ??

    Let's go through this step by step. First, yes, I'd agree that it is a common case that process applications need to get and put data from external relational database systems. The question is "what is the best way to do that". 

    Let me first address some of your questions and assertions:

    You assert: "Surely by having tw objects that reflect that information [in the database]".

    Well, yes and no. Part of the issue is that tw objects aren't really objects, tw objects are essentially XML that is serialized in and out of the execution context and presented to the process developer with an object like interface. That may seem overly theoretical and putting too fine a point on things, but there are a couple of very important distinctions. Most notably that IBM BPM variable types can't have methods and they are serialized in and out of the execution context as one big chunk every time things need to be persisted or restored.

    Let me start at the "no" part. Let's look at a common use case. I have a order to cash process, and I need to look up some additional customer information from a system of record (SOR) in order to process that order. Let's say I have a customer table, a lookup table for customer type, a customer location table, and a customer contact table. 

    If I just modeled objects to reflect that database structure, the way I would with an ORM, I would have a property on the customer object that returns a list of customer contacts for that customer. That would be suicide with IBM BPM as there is no way (with no methods or lazy loading) that could be implemented without loading the entire list of contacts at query time. 

    So, to get to the yes part, I would indeed model the customer as a IBM BPM tw.object "variable type". But the contents of that variable type might be denormalized from what is in the database and it's likely to be simplified as well compared to the table structure. In the common case, I'd write the SQL to get exactly what I want, and create a simple variable type to hold the results and then carry that result along as a subtype of the overall process customer variable type.

    You assert: Using an external service call that can surely benefit from an ORM

    Why? If I look at hibernate's page on why you'd use an ORM they say:

    • ORM prevents you from having to map resultsets by hand in POJOs: IBM BPM does this automatically for you in the SQL connector anyway: you don't need an ORM for this.
    • Less work sync'ing code to relational DB changes: An ORM won't help you with this as the IBM BPM types will have to be sync'd no matter what: the BPM types are immutable parts of a snapshot.
    • Less boilerplate: An ORM would probably actually hurt you here, IBM BPM already eliminates much of the boilerplate, creating wrapper services helps even more, and trying to implement an ORM inside of IBM (if possible at all) would actually probably add a lot of boilerplate to convert between POJOs and tw.objects.
    • DB independence (mostly). An ORM might help you here, but I've never seen this being a big problem in enterprises: the reality of switching DBs is painful regardless of whether you have an ORM in place.
    • Query abstraction. As I discuss above, an ORM probably couldn't help here because of the limits of the IBM BPM system.
    • Performance (largely the ORMs ability to better know what can and should be cached). As I mention in my original post, ORMs are effectively unable to cache in an IBM BPM environment because they don't have enough control.** So an ORM is actually likely to hurt performance.
    • Extendability. Again, there's not much you can do here as an ORM doesn't have primary control.

    If were the case when the weight of the data being shown and updated in the coach were in an external relational database, based on your years of experience, which would be the way to approach the persistence of this data

    Like my response to Rahul, I'm not sure I can completely respond to this in a forum post. Data design and integration approaches are something I'd probably spend days planning and architecting if it was my project, so trying to set for a "one size fits all" approach is a little overly simplistic. That said, here's some rules of thumb:

    • I'd avoid thinking of it as persistence. That naturally just brings up the idea of just having objects in the process that map to the database and "persist" back and forth. 90% of the time, that's probably not the right thing to do. I would, in most cases, think of it just as integration. i.e. "How do I best get the information I need?" "What should I do with that information once I have it?" "How do I update the backend system once I am done?" "How long do I need to keep the information?"
    • Think in process oriented terms. This is perhaps a corollary to the previous point, but don't think of objects, that makes you start thinking in OOA/D and programming architectures that don't work well in BPM. Think instead of process activities, and having specific points in the process when data from external systems is needed and when external systems need to be updated.
    • In most cases I would generally would recommend just using the IPD database connectors. Create simple wrapper services that automatically do the logging, exception handling you need as well as provide the jndi information so that isn't hardcoded all over the place. Then wrap those wrapper services with business focused services. i.e. "Get Customer Name from Customer ID". "Get Orders from Customer for Date Range", "Update Order Database [with Process Object X]".

    David

    *Technically Core Data is not really an ORM, but it's close enough for this discussion.

    **I guess this depends on your implementation, but service result caching is likely to be a much more effective caching strategy anyway.

    Updated on 2015-04-29T21:19:09Z at 2015-04-29T21:19:09Z by dogren@gmail.com
  • abdielcs
    abdielcs
    4 Posts

    Re: Business Data Persistence, out of the box?

    ‏2015-04-30T13:43:20Z  

    Let's go through this step by step. First, yes, I'd agree that it is a common case that process applications need to get and put data from external relational database systems. The question is "what is the best way to do that". 

    Let me first address some of your questions and assertions:

    You assert: "Surely by having tw objects that reflect that information [in the database]".

    Well, yes and no. Part of the issue is that tw objects aren't really objects, tw objects are essentially XML that is serialized in and out of the execution context and presented to the process developer with an object like interface. That may seem overly theoretical and putting too fine a point on things, but there are a couple of very important distinctions. Most notably that IBM BPM variable types can't have methods and they are serialized in and out of the execution context as one big chunk every time things need to be persisted or restored.

    Let me start at the "no" part. Let's look at a common use case. I have a order to cash process, and I need to look up some additional customer information from a system of record (SOR) in order to process that order. Let's say I have a customer table, a lookup table for customer type, a customer location table, and a customer contact table. 

    If I just modeled objects to reflect that database structure, the way I would with an ORM, I would have a property on the customer object that returns a list of customer contacts for that customer. That would be suicide with IBM BPM as there is no way (with no methods or lazy loading) that could be implemented without loading the entire list of contacts at query time. 

    So, to get to the yes part, I would indeed model the customer as a IBM BPM tw.object "variable type". But the contents of that variable type might be denormalized from what is in the database and it's likely to be simplified as well compared to the table structure. In the common case, I'd write the SQL to get exactly what I want, and create a simple variable type to hold the results and then carry that result along as a subtype of the overall process customer variable type.

    You assert: Using an external service call that can surely benefit from an ORM

    Why? If I look at hibernate's page on why you'd use an ORM they say:

    • ORM prevents you from having to map resultsets by hand in POJOs: IBM BPM does this automatically for you in the SQL connector anyway: you don't need an ORM for this.
    • Less work sync'ing code to relational DB changes: An ORM won't help you with this as the IBM BPM types will have to be sync'd no matter what: the BPM types are immutable parts of a snapshot.
    • Less boilerplate: An ORM would probably actually hurt you here, IBM BPM already eliminates much of the boilerplate, creating wrapper services helps even more, and trying to implement an ORM inside of IBM (if possible at all) would actually probably add a lot of boilerplate to convert between POJOs and tw.objects.
    • DB independence (mostly). An ORM might help you here, but I've never seen this being a big problem in enterprises: the reality of switching DBs is painful regardless of whether you have an ORM in place.
    • Query abstraction. As I discuss above, an ORM probably couldn't help here because of the limits of the IBM BPM system.
    • Performance (largely the ORMs ability to better know what can and should be cached). As I mention in my original post, ORMs are effectively unable to cache in an IBM BPM environment because they don't have enough control.** So an ORM is actually likely to hurt performance.
    • Extendability. Again, there's not much you can do here as an ORM doesn't have primary control.

    If were the case when the weight of the data being shown and updated in the coach were in an external relational database, based on your years of experience, which would be the way to approach the persistence of this data

    Like my response to Rahul, I'm not sure I can completely respond to this in a forum post. Data design and integration approaches are something I'd probably spend days planning and architecting if it was my project, so trying to set for a "one size fits all" approach is a little overly simplistic. That said, here's some rules of thumb:

    • I'd avoid thinking of it as persistence. That naturally just brings up the idea of just having objects in the process that map to the database and "persist" back and forth. 90% of the time, that's probably not the right thing to do. I would, in most cases, think of it just as integration. i.e. "How do I best get the information I need?" "What should I do with that information once I have it?" "How do I update the backend system once I am done?" "How long do I need to keep the information?"
    • Think in process oriented terms. This is perhaps a corollary to the previous point, but don't think of objects, that makes you start thinking in OOA/D and programming architectures that don't work well in BPM. Think instead of process activities, and having specific points in the process when data from external systems is needed and when external systems need to be updated.
    • In most cases I would generally would recommend just using the IPD database connectors. Create simple wrapper services that automatically do the logging, exception handling you need as well as provide the jndi information so that isn't hardcoded all over the place. Then wrap those wrapper services with business focused services. i.e. "Get Customer Name from Customer ID". "Get Orders from Customer for Date Range", "Update Order Database [with Process Object X]".

    David

    *Technically Core Data is not really an ORM, but it's close enough for this discussion.

    **I guess this depends on your implementation, but service result caching is likely to be a much more effective caching strategy anyway.


    Thank you very much for your extremely fast and thorough response. I'll try to be responding with some ideas in the same order as your answers.
    First my intention was to note that the tw objects could be a reflection of data as can be found in a relational database, no methods as neither has a database table, in fact are a more accurate representation of data than a java class. I fully agree with the fact that it would't be possible to bring all data from the POJO to the BPM, although could be a tw object that was an exact representation of the table, could be another one that represents a smaller version of itself and an implementation that maps that version. I dont know if the ORM required to bring all that data into the cache, but at least the data traveling to the BPM will be only the minimum information necessary.
    Maybe your question then is why choose to use an ORM and partially fragmenting the information?
    There is a feature that get my attention above all others.
    Suppose we have a tw object that is an exact representation of a table in the database and each instance of the object represent a row in the table. For example, a tw.object.customer, and then change the name of the customer in the coach. We only need a service that will map the object tw to an hibernate POJO and call the save() method. Transparent to us hibernate  using it's cache and presumably the best possible algorithms to determine exactly what changed in the entity and generates the SQL statement to update only that changed (update customer set name = "new_name") and execute all the changes into a single bach. I speak in plural because this also applied not only to the object itself, also to the changes that may have occurred in the entities directly related to the object which maybe were part of tw.
    This is the kind of feature that might draw attention or not to use an ORM, the rest of its features are only a plus. It would be very costly to implement such  mechanism in IBM BPM, in most cases the choice is always run the stored procedure that updates the full object even if there has been no change at all. Perhaps one object is not very traumatic, but maybe if they were a list would be desirable a rethink, plus hand ensure that everything is done in a single transaction. All these kinds of issues are resolved internally by hibernate, of course whit a cost, that may be the answer to "Why hibernate is so slow." But how more efficient would be do that by hand?, or should I do continue without such optimization. May be the answer is not base all persistent of application to hibernate, but what about partially do it?
    Over the control of data that could have the ORM, I think it probably depends on the context, I do not think there is any difference between a call some stored procedures or replace those calls for calls to external services using an ORM application. The integrity of data and transactions in both cases would always be a factor to be controlled carefully. Maybe I do not understand the context in which you find such kind of weakness.
    There is another important factor, at least in my environment I can find ten good java programmers for every good SQL programmer, so follow the way of stored procedures might assume such a risk.
    I hope you understand my point of view, probably influenced by several years of application development rather than by processes, but considering the original meaning of my post, where a relational database is the objective of transactions, I consider still valid. I am not defending hibernate, but I think that the idea behind an ORM should at least be considered.
    Thank you very much for your answers and general rules that share with us, every one of your post is a gold mine for me.

    Updated on 2015-04-30T13:55:53Z at 2015-04-30T13:55:53Z by abdielcs
  • dogren@gmail.com
    dogren@gmail.com
    421 Posts

    Re: Business Data Persistence, out of the box?

    ‏2015-04-30T18:06:06Z  
    • abdielcs
    • ‏2015-04-30T13:43:20Z


    Thank you very much for your extremely fast and thorough response. I'll try to be responding with some ideas in the same order as your answers.
    First my intention was to note that the tw objects could be a reflection of data as can be found in a relational database, no methods as neither has a database table, in fact are a more accurate representation of data than a java class. I fully agree with the fact that it would't be possible to bring all data from the POJO to the BPM, although could be a tw object that was an exact representation of the table, could be another one that represents a smaller version of itself and an implementation that maps that version. I dont know if the ORM required to bring all that data into the cache, but at least the data traveling to the BPM will be only the minimum information necessary.
    Maybe your question then is why choose to use an ORM and partially fragmenting the information?
    There is a feature that get my attention above all others.
    Suppose we have a tw object that is an exact representation of a table in the database and each instance of the object represent a row in the table. For example, a tw.object.customer, and then change the name of the customer in the coach. We only need a service that will map the object tw to an hibernate POJO and call the save() method. Transparent to us hibernate  using it's cache and presumably the best possible algorithms to determine exactly what changed in the entity and generates the SQL statement to update only that changed (update customer set name = "new_name") and execute all the changes into a single bach. I speak in plural because this also applied not only to the object itself, also to the changes that may have occurred in the entities directly related to the object which maybe were part of tw.
    This is the kind of feature that might draw attention or not to use an ORM, the rest of its features are only a plus. It would be very costly to implement such  mechanism in IBM BPM, in most cases the choice is always run the stored procedure that updates the full object even if there has been no change at all. Perhaps one object is not very traumatic, but maybe if they were a list would be desirable a rethink, plus hand ensure that everything is done in a single transaction. All these kinds of issues are resolved internally by hibernate, of course whit a cost, that may be the answer to "Why hibernate is so slow." But how more efficient would be do that by hand?, or should I do continue without such optimization. May be the answer is not base all persistent of application to hibernate, but what about partially do it?
    Over the control of data that could have the ORM, I think it probably depends on the context, I do not think there is any difference between a call some stored procedures or replace those calls for calls to external services using an ORM application. The integrity of data and transactions in both cases would always be a factor to be controlled carefully. Maybe I do not understand the context in which you find such kind of weakness.
    There is another important factor, at least in my environment I can find ten good java programmers for every good SQL programmer, so follow the way of stored procedures might assume such a risk.
    I hope you understand my point of view, probably influenced by several years of application development rather than by processes, but considering the original meaning of my post, where a relational database is the objective of transactions, I consider still valid. I am not defending hibernate, but I think that the idea behind an ORM should at least be considered.
    Thank you very much for your answers and general rules that share with us, every one of your post is a gold mine for me.

    I sort of want to let this thread drop, because I've pretty much said what I have to say. In the end, all I can say is that I've seen several people try and I've never seen anyone succeed. But I can't resist a couple more comments.

    First, you say "all we have to do is create a service that will map the object tw to a hibernate POJO and call the save() method". This is actually a very hard problem. Both in terms of programming effort and in terms of CPU and memory usage and runtime. Also note that Java connectors are stateless: which means there are a lot of build up and tear down costs every time. And note that there is no real way to reflect on a tw.ojbect so it is very difficult to do this in a generic way. POJOs are also a pain to deal with because they require an entirely separate build system, a separate versioning system, and require a lot manual effort to keep in sync with the process application.

    I'm not bashing ORMs here. As I mention, I use one on a regular basis. But you have to remember that you are not in a Java programming environment. You are in a BPM environment that is primarily Javascript and which provides you some access to Java, primarily for integration purposes.

    Using Hibernate in IBM BPM is a lot like you choosing to use Core Data as your ORM in your Java application. "All you have to do" is instantiate a Core Data environment in a VM, sync your POJOs with NSManagedObjects objects that exist in that environment and then call  save on the NSManagedObject context. And then just recompile and redeploy the the Objective-C environment any time you want to make a change to your POJOs. (Except that that would arguably be easier, because its much easier to reflect on Java objects and Objective-C objects than it is tw.objects.)

    David

    Updated on 2015-04-30T18:09:41Z at 2015-04-30T18:09:41Z by dogren@gmail.com