Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Extending the WebSphere Commerce data extract framework to perform delta extractions

Dinup P. Pillai (dinup.pillai@in.ibm.com), Software Engineer, IBM
Photo of Dinup P Pillai
Dinup P. Pillai is a Software Developer with the WebSphere Commerce team at the IBM India Software Lab. He has three years of experience in the e-commerce field. His areas of experience include Java, J2EE, and web services.
Vipin Murali (vipin.murali@in.ibm.com), Software Engineer, IBM
Photo of Vipin Murali
Vipin Murali is a Software Developer with the WebSphere Commerce team at the IBM India Software Lab. He has three years of experience in the e-commerce field. His areas of experience include Java, J2EE, and web services.

Summary:  This article explains how to extend the framework of the data extraction utility to perform delta extractions. Delta extractions extract only data that has changed since the previous extraction, rather than extracting the full set of data. This customization procedure provides a more efficient extraction process.

Date:  26 Oct 2011
Level:  Intermediate PDF:  A4 and Letter (118KB | 11 pages)Get Adobe® Reader®

Activity:  7926 views
Comments:  

Introduction

The Intelligent Offer data extraction utility is a command-line utility that creates the Enterprise Product Report (EPR) data that Coremetrics® requires for dynamic recommendations. The utility extracts catalog data from the database and generates Enterprise Category Definition File (ECDF) and Enterprise Product Content Mapping File (EPCMF) files in the correct format to load into Coremetrics. The utility extracts data from the WebSphere® Commerce database and formats and writes it into CSV files. It contains several components such as Data Reader, Business Object Builder, Business Object Mediator, and Data Writer.

This utility is provided as a part of WebSphere Commerce V7 Feature Pack 3 release. For more information, see this Information Center topic, Data extraction utility for dynamic recommendations in Coremetrics Intelligent Offer.

The default implementation of the data extraction utility retrieves all the catalog entries belonging to the store. Even if only a few records have been modified since the previous extraction, the full dataset is extracted from the source system each time the utility is run. The delta extraction mechanism improves efficiency by retrieving only those records that have changed after a specified date, resulting in faster extraction of the dataset. For this article, you will primarily customize the Data Reader layer. This mechanism works in the production environment as well as a staging environment (base schema).

This article assumes that WebSphere Commerce V7 FEP3 is installed and the Intelligent Offer data extract utility is set up and configured. For more information, see Configuring the Intelligent Offer data extraction utility.


Step 1: Create the Delta Extract Reader Mediator classes

Create a new abstract AbstractDeltaExtractCatalogReaderMediator class that extends the AbstractCatalogReaderMediator class, and create a catalog-entry-specific reader mediator DeltaExtractCatalogEntryReaderMediator class that extends the AbstractDeltaExtractCatalogReaderMediator class. These mediators invoke the change history API to retrieve the change history information.

About the change history API

WebSphere Commerce provides the change history API that returns change history information, such as the primary object ID of the changed noun, based on the search criteria. The change history feature captures the information of a catalog entry if:

  1. A new catalog entry is created.
  2. An existing catalog entry is deleted.
  3. An existing catalog entry's property is modified.

The following search criteria are passed to the API:

  • Workspace: Sets the workspace name.
  • TaskGroup: Sets the task group name.
  • ObjectType: Sets the type of the noun, for example, CatalogEntry.
  • StoreId: Sets the store ID from which the change history is to be retrieved.
  • StartDate: Sets the date starting from which change history information is returned.
  • UIObjectNames: Lists the catalog entry types to be retrieved, for example, product, kit, and so on.
  • Actions: Returns the change history information based on the actions performed on the noun, for example, N (new), D (delete), U (update).
  • DBType: Sets the data base type which determines the paging mechanism.
  • BeginIndex: Sets the begin index.
  • PageSize: Sets the page size.

A database connection needs to be passed along with the change history search criteria. The change history API retrieves the change history information from the database with the help of this database connection. The database properties need to be configured by the user in the environment configuration file for the data extraction utility, wc-dataextract-env.xml. This configuration step is covered later in Step 4.

A brief description of the newly created mediators is provided below:

  • AbstractDeltaExtractCatalogReaderMediator: This class contains the abstract methods that need to be implemented by the sub classes. It initializes the StartDate parameter configured in the business object configuration file. With the help of the primary object keys retrieved by the change history API, the catalog-entry-specific data is returned based on the following actions:
    • Actions = (N, U): Primary object keys are passed on to the new catalog service as XPath parameters to retrieve the records.
    • Actions = (D): As the records pertaining to the deleted catalog entries do not exist in the database, a new response BOD is built for the deleted entries with its parent catalog group set to "Uncategorized".
  • DeltaExtractCatalogEntryReaderMediator: This class implements the abstract methods declared in the abstract class AbstractDeltaExtractCatalogReaderMediator. It sets the change history search criteria specific to the catalog entry.

Methods to be overridden in the sub classes

The following methods from AbstractCatalogReaderMediator need to be overridden.

The method shown in Listing 1 originally initializes the catalog entry reader mediator. Therefore, this method needs to be overridden to initialize the delta extract mediator classes.

Listing 1. Initialization

a) public void init () throws DataLoadException

The method shown in Listing 2 retrieves a list of catalog entry logical nouns by invoking a catalog service based on the list of catalog entry IDs and the access profile.

Listing 2. Service invocation

b) protected Object getDataObject(String beginIndex, String pageSize, String storeId) 
 throws AbstractBusinessObjectDocumentException, DataLoadException

The following methods from AbstractDeltaExtractCatalogReaderMediator need to be overridden.

The method shown in Listing 3 gets the details pertaining to the deleted catalog entries from the TaskGroupChangeHistoryDataSet object, in the form of a Business Object Document (BOD).

Listing 3. Retrieve the deleted catalog entries

a) protected Object getDeletedDataObjects (List<TaskGroupChangeHistoryDataSet> 
 changeHistoryList)

The method shown in Listing 4 builds ShowCatalogEntryDataAreaType, with the value of recordSetTotal retrieved from the list of the TaskGroupChangeHistoryDataSet objects.

Listing 4. Build the data object

b) protected Object buildShowDataAreaObject(List <TaskGroupChangeHistoryDataSet> 
 changeHistoryList)

The method shown in Listing 5 builds the SelectionCriteriaHelper object using the XPath expression, XPath parameters, and the control parameters.

Listing 5. Build the selection criteria

c) protected SelectionCriteriaHelper buildSelectCriteriaHelper(String uniqueIdParameter)

The method shown in Listing 6 returns a list of CatalogEntry nouns from the response.

Listing 6. Get the retrieved catalog entries

d) protected List getDataObjectFromResponse(Object response) throws DataLoadException

The method shown in Listing 7 returns the record set total returned from the CatalogEntry service.

Listing 7. Get the record set total

e) protected String getRecordSetTotal(Object response) throws DataLoadException

The method shown in Listing 8 populates the ChangeHistorySearchCriteria object with the search criteria specific to the catalog entry.

Listing 8. Build the change history search criteria

f) protected void buildChangeHistorySearchCriteria() throws DataLoadException

Refer to AbstractDeltaExtractCatalogReaderMediator.java and DeltaExtractCatalogEntryReaderMediator.java in the code_snippets.zip file that is provided in the Download section of this article.

Path:

WebSphereCommerceServerExtensionsLogic\src\com\mycompany\commerce\catalog\dataload\
 datareader\AbstractDeltaExtractCatalogReaderMediator.java

WebSphereCommerceServerExtensionsLogic\src\com\mycompany\commerce\catalog\dataload\
 datareader\DeltaExtractCatalogEntryReaderMediator.java


Step 2: Create the custom exception classes and properties file

  1. Create a new exception class, DeltaExtractApplicationException, that extends the DataLoadApplicationException class. The delta extract data reader mediators will throw the newly created exception when application errors occur during the extraction of business data.
  2. Create a new properties file, WcDataloadMessages_en_US.properties, that contains the exception messages for the delta extract application exception.
  3. Create a new message keys file, DeltaExtractMessageKeys.java, that contains the exception message keys for the delta extract application exception.
  4. Refer to DeltaExtractApplicationException.java, WcDataloadMessages_en_US.properties, and DeltaExtractMessageKeys.java in the code_snippets.zip file that is provided in the Download section of this article.

    Path:

    WebSphereCommerceServerExtensionsLogic\src\com\mycompany\commerce\catalog\dataload\
     exception\DeltaExtractApplicationException.java
    
    WebSphereCommerceServerExtensionsLogic\src\com\mycompany\commerce\catalog\dataload\
     logging\properties\WcDataloadMessages_en_US.properties
    
    WebSphereCommerceServerExtensionsLogic\src\com\mycompany\commerce\catalog\dataload\
     exception\DeltaExtractMessageKeys.java
    


Step 3: Create the main SQL query in the query template file

  1. Create a new SQL query in the query template file as shown in Listing 9. The Xpath parameters comprise of a list of catalog entry IDs that have been modified after a specific date. This main SQL returns primary keys that will be passed to the associated SQLs.


    Listing 9. Query template file
    BEGIN_XPATH_TO_SQL_STATEMENT
    	name=/CatalogEntry[CatalogEntryIdentifier[(UniqueID=)]]
    	base_table=CATENTRY
    	sql=
    		SELECT
    	     		CATENTRY.$COLS:CATENTRY_ID$
    		FROM
    	     		CATENTRY
    		WHERE
    				CATENTRY_ID IN (?UniqueID?)
    END_XPATH_TO_SQL_STATEMENT
    

  2. Refer to wc-query-MyCompany-CatalogEntry-admin-get-ext.tpl in the code_snippets.zip file that is provided in the Download section of this article.

    Path:

    WC\xml\config\com.ibm.commerce.catalog-ext\wc-query-MyCompany-CatalogEntry-admin-get-
    ext.tpl


Step 4: Update the data extract configuration files

The following data extract configuration files shown in Listing 10 and Listing 11 need to be updated.

In the business object configuration file, wc-dataextract-catalog-entry.xml:

  1. Update the data reader className value with the custom data reader mediator class created in Step 1.
  2. Add a new startDate property to specify the start date for the delta extraction, as shown in Listing 10.

    Listing 10. Configuration
    <_config:DataReader className="com.mycompany.commerce.catalog.dataload.datareader. 
    DeltaExtractCatalogEntryReaderMediator " pageSize="700" >
     <_config:property name="clientId" value="99999999"/>
     <_config:property name="storeId" value="10001"/>
     <_config:property name="username" value="wcsadmin"/>
     <_config:property name="password" value="3fdBFMFoiGNQ0zUStB865w=="/>
     <_config:property name="startDate" value="2011-01-01 00:00:00.000000000" />
    </_config:DataReader>
    

  3. Add the following database properties shown in Listing 11 in the wc-dataextract-env.xml file. You can use this configuration to create a database connection for the change history API.

    Listing 11. Database configuration
    <_config:Database type="db2" name="mall" user="build" password=
     "xK36ck80s6GbQL+aVIOszg==" server="localhost" port="50000" schema="wcs" />
    

  4. Refer to wc-dataextract-catalog-entry.xml and wc-dataextract-env.xml in the code_snippets.zip file that is provided in the Download section of this article.

    Path:

    samples\DataExtract\Catalog\DeltaExtract\wc-dataextract.xml
    samples\DataExtract\Catalog\DeltaExtract\wc-dataextract-env.xml
    samples\DataExtract\Catalog\DeltaExtract\wc-dataextract-catalog-entry.xml
    


Step 5: Run the data extract utility

To perform delta extraction, run the data extract utility from the command line to extract the catalog entry records that have changed since the configured start date (Listing 12).

Listing 12. Running the utility

dataextract.bat <WC_toolkit>\samples\DataExtract\Catalog\DeltaExtract\
 wc-dataextract.xml

Figure 1 shows the output console after successfully running the data extract utility by performing delta extract.


Figure 1. Running the data extract utility
Running the data extract utility

Limitation

The delta extraction mechanism is not supported for a Cloudscape or Derby database.


Conclusion

In this article, you learned how to extend the data extract framework to perform delta extractions. This procedure provided a more efficient extraction process.



Download

DescriptionNameSizeDownload method
Code samplecode_snippets.zip17KBHTTP

Information about download methods


Resources

Learn

Discuss

About the authors

Photo of Dinup P Pillai

Dinup P. Pillai is a Software Developer with the WebSphere Commerce team at the IBM India Software Lab. He has three years of experience in the e-commerce field. His areas of experience include Java, J2EE, and web services.

Photo of Vipin Murali

Vipin Murali is a Software Developer with the WebSphere Commerce team at the IBM India Software Lab. He has three years of experience in the e-commerce field. His areas of experience include Java, J2EE, and web services.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=767718
ArticleTitle=Extending the WebSphere Commerce data extract framework to perform delta extractions
publish-date=10262011

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers