Topic
  • 2 replies
  • Latest Post - ‏2014-10-22T10:15:53Z by Gary_Scarcella
Gary_Scarcella
Gary_Scarcella
22 Posts

Pinned topic Generic solution for parsing XML data for different event types

‏2014-05-05T10:02:56Z |

Question from one of my application developers:

We have DB2 LUW database with a table which has 14 columns with regular data types such as INT, CHAR, DATE, etc.., and one column has xml datatype. We store in this column different types of business events which looks like this:

rAdminSystem/><PolicyNumber>00001</PolicyNumber><LineOfBusiness>LifeInsurance</LineOfBusiness><ProductType>LIFE</ProductType><Premium>0.00</Premium></Policy></Communication></EventData>

rAdminSystem/><OldAddress>1 Main str., Springfield, MA 01101</OldAddress><NewAddress>1 Broad str., Springfield, MA 01101</NewAddress><OldPhone>111-111-1111</OldPhone><NewPhone>111-222-2222</NewPhone><OldEmail></OldEmail><NewEmail>234@bbs.com</NewEmail></Residence></Communication></EventData>

As you can see from example the event data is not consistent and would base on the event type. The dilemma which we are facing is how to parse the data in generic fashion, which would not be event dependent.  For example for the first event we can parse the data by writing the following SQL/XML statement :

SELECT  EVNT.SYS_EVNT_ID, 
                X.*
FROM
                BEM.EVNT EVNT,
                XMLTABLE ('$d/EventData/Communication/Policy' passing EVNT_DATA_XML as "d" COLUMNS
                PolicyNumber varchar(8) PATH 'PolicyNumber' ,
                LineOfBusiness varchar(14) path 'LineOfBusiness',
                ProductType varchar(10) path 'ProductType',
                Premium decimal(8,4) path 'Premium' ) AS X

But for the second event from example above we will need to modify SQL/XML to look like this:

SELECT
                EVNT.SYS_EVNT_ID,
                X.*
FROM
                BEM.EVNT EVNT,
                XMLTABLE ('$d/EventData/Communication/Residence' passing EVNT_DATA_XML as "d" COLUMNS
                OldAddress varchar(250) PATH 'OldtAddress' ,
                NewAddress varchar(250) path 'NewAddress',
                OldPhone char(13) path 'OldPhone',
                NewPhone char(13) path 'NewPhone',
                OldEmail varchar(100) path 'OldEmail',
                NewEmail varchar(100) path 'NewEmail') AS X

The number and type of events are indefinite. We are looking for more generic solution to parse the xml datatype, regardless name tags or number of columns within it.
Any help or ideas will be greatly appreciated
                                           

  • Gary_Scarcella
    Gary_Scarcella
    22 Posts
    ACCEPTED ANSWER

    Re: Generic solution for parsing XML data for different event types

    ‏2014-10-22T10:15:53Z  

    Hi Gary... sorry for responding to this with such delay.

    This is a tricky problem. First, I'm not quite convinced that the number of distinct event types is infinite. Maybe there are dozens, maybe hundreds.

    Schema variability can be a wonderful thing, especially for the application that generates (writes) the diverse events, but the price is being paid when the data is read, which is the problem here.

    The question is, how do the consuming applications handle this large diversity of event types? They must have the same problem as what you describe here.

    Here are three ideas.

    (1)

    The element extraction with an XMLTABLE function can be encapsulated in a user-defined functions (UDF), or in views. There are examples for that in the DB2 pureXML Cookbook. You could create one such UDF for each event type. Additionally, you could have an integer or char column in the table that indicates the event type for each record.

    When you read a record you would first check the event type and then use the corresponding UDF (or view) to extract the elements, as a 2-step approach.

    Whenever someone (who???) invents a new event type they need to enable you to create a corresponding UDF.

    Yes, this is tedious if you have very many event types, but it can be done in a very systematic fashion.

    (2)

    You could write a generic table function that extracts a certain number of elements into varchar columns:

    SELECT  EVNT.SYS_EVNT_ID, 
                    X.*
    FROM
        BEM.EVNT EVNT,
        XMLTABLE ('$d/EventData/Communication/*' passing EVNT_DATA_XML as "d" COLUMNS
                    field1 varchar(255) PATH './*[1]' ,
                    field2 varchar(255) path './*[2]',
                    field3 varchar(255) path './*[3]',
                    ....
    '

                    ) AS X

    Yes, this is not pretty and treating everything as a varchar(255) is not ideal either.

     

    (3)

    Use XMLTABLE to produce a list of name value pairs, as explained in this blog post:

    http://nativexmldatabase.com/2010/05/26/xml-profiling-how-to-get-a-list-of-all-elements-and-attributes/

     

    What do you think?

    - Matthias

    Thanks Matthias,

    We had tried some of those approaches and yes they are not pretty.  I had set up a stored procedure to bring back multiple result sets (1 for each event type).  This was working from the DB2 perspective, but the reporting application was Cognos and Cognos could only read the first result set being returned.

    Since the "who???" is another internal company application, we decided to have them pass the data differently so we could store the event data relationally.  The truly variable portion is still being passed via XML.

    Thanks,

    Gary.

     

  • MatthiasNicola
    MatthiasNicola
    322 Posts

    Re: Generic solution for parsing XML data for different event types

    ‏2014-10-22T07:03:34Z  

    Hi Gary... sorry for responding to this with such delay.

    This is a tricky problem. First, I'm not quite convinced that the number of distinct event types is infinite. Maybe there are dozens, maybe hundreds.

    Schema variability can be a wonderful thing, especially for the application that generates (writes) the diverse events, but the price is being paid when the data is read, which is the problem here.

    The question is, how do the consuming applications handle this large diversity of event types? They must have the same problem as what you describe here.

    Here are three ideas.

    (1)

    The element extraction with an XMLTABLE function can be encapsulated in a user-defined functions (UDF), or in views. There are examples for that in the DB2 pureXML Cookbook. You could create one such UDF for each event type. Additionally, you could have an integer or char column in the table that indicates the event type for each record.

    When you read a record you would first check the event type and then use the corresponding UDF (or view) to extract the elements, as a 2-step approach.

    Whenever someone (who???) invents a new event type they need to enable you to create a corresponding UDF.

    Yes, this is tedious if you have very many event types, but it can be done in a very systematic fashion.

    (2)

    You could write a generic table function that extracts a certain number of elements into varchar columns:

    SELECT  EVNT.SYS_EVNT_ID, 
                    X.*
    FROM
        BEM.EVNT EVNT,
        XMLTABLE ('$d/EventData/Communication/*' passing EVNT_DATA_XML as "d" COLUMNS
                    field1 varchar(255) PATH './*[1]' ,
                    field2 varchar(255) path './*[2]',
                    field3 varchar(255) path './*[3]',
                    ....
    '

                    ) AS X

    Yes, this is not pretty and treating everything as a varchar(255) is not ideal either.

     

    (3)

    Use XMLTABLE to produce a list of name value pairs, as explained in this blog post:

    http://nativexmldatabase.com/2010/05/26/xml-profiling-how-to-get-a-list-of-all-elements-and-attributes/

     

    What do you think?

    - Matthias

  • Gary_Scarcella
    Gary_Scarcella
    22 Posts

    Re: Generic solution for parsing XML data for different event types

    ‏2014-10-22T10:15:53Z  

    Hi Gary... sorry for responding to this with such delay.

    This is a tricky problem. First, I'm not quite convinced that the number of distinct event types is infinite. Maybe there are dozens, maybe hundreds.

    Schema variability can be a wonderful thing, especially for the application that generates (writes) the diverse events, but the price is being paid when the data is read, which is the problem here.

    The question is, how do the consuming applications handle this large diversity of event types? They must have the same problem as what you describe here.

    Here are three ideas.

    (1)

    The element extraction with an XMLTABLE function can be encapsulated in a user-defined functions (UDF), or in views. There are examples for that in the DB2 pureXML Cookbook. You could create one such UDF for each event type. Additionally, you could have an integer or char column in the table that indicates the event type for each record.

    When you read a record you would first check the event type and then use the corresponding UDF (or view) to extract the elements, as a 2-step approach.

    Whenever someone (who???) invents a new event type they need to enable you to create a corresponding UDF.

    Yes, this is tedious if you have very many event types, but it can be done in a very systematic fashion.

    (2)

    You could write a generic table function that extracts a certain number of elements into varchar columns:

    SELECT  EVNT.SYS_EVNT_ID, 
                    X.*
    FROM
        BEM.EVNT EVNT,
        XMLTABLE ('$d/EventData/Communication/*' passing EVNT_DATA_XML as "d" COLUMNS
                    field1 varchar(255) PATH './*[1]' ,
                    field2 varchar(255) path './*[2]',
                    field3 varchar(255) path './*[3]',
                    ....
    '

                    ) AS X

    Yes, this is not pretty and treating everything as a varchar(255) is not ideal either.

     

    (3)

    Use XMLTABLE to produce a list of name value pairs, as explained in this blog post:

    http://nativexmldatabase.com/2010/05/26/xml-profiling-how-to-get-a-list-of-all-elements-and-attributes/

     

    What do you think?

    - Matthias

    Thanks Matthias,

    We had tried some of those approaches and yes they are not pretty.  I had set up a stored procedure to bring back multiple result sets (1 for each event type).  This was working from the DB2 perspective, but the reporting application was Cognos and Cognos could only read the first result set being returned.

    Since the "who???" is another internal company application, we decided to have them pass the data differently so we could store the event data relationally.  The truly variable portion is still being passed via XML.

    Thanks,

    Gary.