XML format

For the rule scoring apply function, item sets and sequences must be encoded as XML strings.

XML string for an item set

The XML string for an item set looks like this:

<itemset>
   <item>...</item>
   <item>...</item>
   ...
</itemset>

Example: XML string of an item set

The item set {cheese, milk, butter} is encoded in the following XML string:

<itemset>
   <item>cheese</item>
   <item>milk</item>
   <item>butter</item>
</itemset>

XML string for a sequence

The XML-format for a sequence looks like this:

<sequence>
   <itemset group="...">
       <item>...</item>
       <item>...</item>
       ...
   <itemset>
   ...
   <itemset group="...">
       <item>...</item>
       <item>...</item>
       ...
   <itemset>
</sequence>

Example: XML string of a sequence

The sort order of the item sets is determined by the values of the group attribute.

The sequence {cheese, milk}>>>{bread, butter} is encoded in the following XML string:

<sequence>
   <itemset group="2005-08-12">
       <item>cheese</item>
       <item>milk</item>
   <itemset>
   <itemset group="2005-08-13">
       <item>bread</item>
       <item>butter</item>
   <itemset>
</sequence>
where:
2005-08-12
is the purchase date of the items cheese and milk
2005-08-13
is the purchase date of the items bread and butter

Building the XML strings

If you use Db2 Version 8.2 or higher, you can build the XML strings by using one of the following functions:

  • XMLELEMENT
  • XMLCONCAT
  • XMLSERIALIZE

If you use XMLSERIALIZE to build the XML string for the item set {cheese, milk, butter}, the expression looks like this:

XMLSERIALIZE(CONTENT 
      XMLELEMENT(NAME "itemset",
        XMLCONCAT(
          XMLELEMENT(NAME "item", 'cheese'),
          XMLELEMENT(NAME "item", 'milk'),
          XMLELEMENT(NAME "item", 'butter')))
      AS CLOB(512000))

If you use Db2 Version 8.1, you can use XML2CLOB instead of XML serialize.

If you use XML2CLOB, this expression looks like this:

cast(XML2CLOB( 
      XMLELEMENT(NAME "itemset",
        XMLCONCAT(
          XMLELEMENT(NAME "item", 'cheese'),
          XMLELEMENT(NAME "item", 'milk'),
          XMLELEMENT(NAME "item", 'butter'))))
      AS CLOB(512000))
If the input data is in a table that includes one column for the transaction or group ID and one column for the item, you can use the XMLAGG function to aggregate the items to an item set:
SELECT "TRANSID", 
        XMLSERIALIZE(CONTENT 
           XMLELEMENT(NAME "itemset",
              XMLAGG(XMLELEMENT(NAME 
        "item", "ITEMID")))
         AS CLOB(512000))
FROM "RETAIL_SCORING” GROUP BY "TRANSID"
For the sequence {cheese, milk}>>>{bread, butter}, you can use the following calls of XML functions to build the XML string:
XMLSERIALIZE(CONTENT 
    XMLELEMENT(NAME "sequence", 
     XMLCONCAT( 
        XMLELEMENT(NAME "itemset",       
          XMLATTRIBUTES('2005-08-12' AS "group"),
	  XMLCONCAT(
            XMLELEMENT(NAME "item", 'cheese'),
            XMLELEMENT(NAME "item", 'milk')
        )),
        XMLELEMENT(NAME "itemset",       
          XMLATTRIBUTES('2005-08-13' AS "group"),
          XMLCONCAT(
            XMLELEMENT(NAME "item", 'bread'),
            XMLELEMENT(NAME "item", 'butter')
        ))
      ))		
      AS CLOB(512000))
If the input data resides in a table, you can build the XML string by using the following SELECT statement:
WITH "TRANSACTIONS" ("CUSTOMER_ID", "TRANSDATE", "ITEMSET") AS (
  SELECT  "CUSTOMER_ID", "TRANSDATE", 
          XMLELEMENT(NAME "itemset",
            XMLATTRIBUTES("TRANSDATE" AS "group"),
            XMLAGG(XMLELEMENT(NAME "item", "ITEMID")))
  FROM "RETAIL_SCORING"
  GROUP BY "CUSTOMER_ID", "TRANSDATE"
)  
SELECT
  “CUSTOMER_ID”, 
  XMLSERIALIZE( CONTENT XMLELEMENT(NAME "sequence", 
                XMLAGG( "ITEMSET" ))
  AS CLOB(512000)) AS "SEQUENCE"
FROM “TRANSACTIONS”
GROUP BY “CUSTOMER_ID”