Customizing the WebSphere Commerce data extract framework using in-memory paging

Learn how to customize the WebSphere® Commerce data extract framework by incorporating in-memory paging, which can significantly improve performance for large data sets and provide a more efficient data extraction process.

Share:

Vipin Murali (vipin.murali@in.ibm.com), Software Engineer, IBM

Photo of Vipin MuraliVipin Murali is a Software Developer with the WebSphere Commerce team at the IBM India Software Lab. He has three years of experience in the e-commerce field. His areas of experience include Java, J2EE, and web services.



Dinup P. Pillai (dinup.pillai@in.ibm.com), Software Engineer, IBM

Photo of Dinup P PillaiDinup P. Pillai is a Software Developer with the WebSphere Commerce team at the IBM India Software Lab. He has three years of experience in the e-commerce field. His areas of experience include Java, J2EE, and web services.



12 October 2011

Introduction

The Coremetrics® Intelligent Offer data extraction utility is a command-line utility that creates the Enterprise Product Report (EPR) data. This data is required by Coremetrics for dynamic recommendations. The utility extracts catalog data from the database and generates Enterprise Category Definition File (ECDF) and Enterprise Product Content Mapping File (EPCMF) files in the correct format to load into Coremetrics. The utility also extracts data from the WebSphere Commerce database and formats and writes it into CSV files. It contains several components, such as Data Reader, Business Object Builder, Business Object mediator, and Data Writer.

This utility is provided as a part of WebSphere Commerce V7 Feature Pack 3. For more information, see this Information Center topic, Data extraction utility for dynamic recommendations in Coremetrics Intelligent Offer.

The default implementation for the data extract solution performs "database level paging" by injecting paging indexes to the main SQL query. This query is executed for every service invocation. You can increase the database level paging size by changing the page size parameter in the data extract configuration to increase the data extract performance as needed. For more information, see this Information Center topic, Sample business object configuration file EPCMF data.

If you have a very large dataset and want to achieve considerable performance improvements, you may want to explore the in-memory paging custom approach for the Intelligent Offer Data Extract utility as discussed in this article. This approach runs the main SQL query only once and loads all the primary keys in the memory. Based on the specified paging parameters, the sub-list of the primary keys is passed on to the associated SQLs. For this article, you need to primarily customize the Data Reader layer, SQL Composer, and Xpath SQL Key Processor. You also need to be familiar with the WebSphere Commerce Data Service Layer. For more information, see this Information Center topic, Working with the data service layer.

This article assumes that WebSphere Commerce V7 FEP3 is installed and the Intelligent Offer data extract utility is set up and configured. For more information, see Configuring the Intelligent Offer data extraction utility.


Step 1: Create a custom SQL Composer

Create a new class CustomDataExtractSQLComposer, which extends the abstract class SQLComposer. This custom composer contains code that executes the main SQL query on the first service request, and thereafter executes a dummy SQL query for the subsequent service requests.

Methods to be overridden in the subclasses

The method shown in Listing 1 is the extension point where SQLComposer can modify and compose the final SQL statement.

Listing 1. SQLComposer extension point
a) public SQLComposerInfo composeSQLStatement(String sqlName, String entityTableName, 
List resultSetInfo, String sqlstatement, List params) throws DataServiceSystemException

Refer to CustomDataExtractSQLComposer.java in the code_snippets.zip file that is provided in the Download section of the article.

Path:

WebSphereCommerceServerExtensionsLogic\src\com\mycompany\commerce\catalog\facade\server\
services\dataaccess\db\jdbc\CustomDataExtractSQLComposer.java

<WebSphereCommerceServerExtensionsLogic> is the name of the project where the data extract specific customized files are placed.


Step 2: Create a custom Xpath SQL Key Processor

Create a new class CustomDataExtractKeyProcessor, which extends the abstract class XPathSQLKeyProcessor. On the first service request, this custom processor loads the list of primary keys returned by the main SQL query into memory. It then creates a sublist of the primary keys depending on the paging parameters.

Methods to be overridden in the subclasses

The method shown in Listing 2 returns a list of keys that will be used in subsequent associated SQLs.

Listing 2. XpathKeyProcessor extension point
a) abstract public List getKeys(List keys, Map ahmXPathQueryParameters, int 
sqlPagingLimit)

Refer to CustomDataExtractKeyProcessor.java in the code_snippets.zip that is provided in the Download section of the article.

Path:

WebSphereCommerceServerExtensionsLogic\src\com\mycompany\commerce\catalog\facade\server\
services\dataaccess\processor\CustomDataExtractKeyProcessor.java

Step 3: Create the main SQL Query in the query template file

Create a new SQL query in the query template file as shown in Listing 3. This makes use of the custom composer and processor created in the previous two steps.

Listing 3. Query template file
BEGIN_XPATH_TO_SQL_STATEMENT
name=/CatalogEntry[CatalogEntryIdentifier[ExternalIdentifier[StoreIdentifier
[(UniqueID=)]] and InMemoryPaging]]
base_table=CATENTRY
className=com.ibm.commerce.catalog.facade.server.services.dataaccess.db.
jdbc.CustomDataExtractSQLComposer
sql_key_processor=com.ibm.commerce.catalog.facade.server.services.dataaccess.
processor.CustomDataExtractKeyProcessor
sql=
	SELECT
 		CATENTRY.$COLS:CATENTRY_ID$
     	FROM
		CATENTRY JOIN STORECENT ON (CATENTRY.CATENTRY_ID =
		STORECENT.CATENTRY_ID AND STORECENT.STOREENT_ID IN
		(?UniqueID?))
	WHERE
	CATENTRY.CATENTTYPE_ID != 'ItemBean' AND CATENTRY.CATENTTYPE_ID
       != 'BundleBean' AND CATENTRY.BUYABLE=1 AND CATENTRY.MARKFORDELETE=0
	ORDER BY
		CATENTRY.CATENTRY_ID
END_XPATH_TO_SQL_STATEMENT

Refer to wc-query-MyCompany-CatalogEntry-admin-get-ext.tpl in the code_snippets.zip file that is provided in the Download section of the article.

Path:

WC\xml\config\com.ibm.commerce.catalog-ext\wc-query-MyCompany-CatalogEntry-admin-get-ext.
tpl

Step 4: Create a custom Reader Mediator

Create a new class CustomDataExtractReaderMediator, which extends the abstract class mediator and invokes the catalog component service created in Step 3. The following parameters are passed along with the service request:

  • _cat.beginIndex: Sets the value of the record set start number.
  • _cat.maxItems: Sets the page size for the record set.
  • _cat.isFirstCall: Indicates the first service request.
  • _wcf.ap: Sets the access profile for the request.
  • _wcf.dataLanguageIds: Sets the data language ID for the request.

Methods to be overridden in the subclasses

The following methods from CatalogEntryReaderMediator need to be overridden.

The method shown in Listing 4 originally initializes the catalog entry reader mediator. Hence, it needs to be overridden to initialize the delta extract mediator classes.

Listing 4. Initialization
a) public void init () throws DataLoadException

The method shown in Listing 5 retrieves a list of catalog entry logical nouns by invoking a catalog service based on the list of catalog entry IDs and access profile.

Listing 5. Service invocation
b) protected Object getDataObject(String beginIndex, String pageSize, String storeId) 
throws AbstractBusinessObjectDocumentException, DataLoadException

Refer to CustomDataExtractReaderMediator.java in the code_snippets.zip file that is provided in the Download section of the article.

Path:

WebSphereCommerceServerExtensionsLogic\src\com\mycompany\commerce\catalog\dataload\
datareader\CustomDataExtractReaderMediator.java

Step 5: Update the business object configuration file

Update the data reader class name in the business object configuration file wc-dataextract-catalog-entry.xml (see Listing 6) with the custom data reader mediator created in Step 4.

Listing 6. Configuration
<_config:DataReader className="com.mycompany.commerce.catalog.dataload.
 datareader. CustomDataExtractReaderMediator " pageSize="700" >
  <_config:property name="clientId" value="99999999"/>
  <_config:property name="storeId" value="10001"/>
  <_config:property name="username" value="wcsadmin"/>
  <_config:property name="password" value="3fdBFMFoiGNQ0zUStB865w=="/>
</_config:DataReader>

Refer to wc-dataextract-catalog-entry.xml in the code_snippets.zip file that is provided in the Download section of the article.

Path:

samples\DataExtract\Catalog\InMemoryPaging\wc-dataextract.xml
samples\DataExtract\Catalog\InMemoryPaging\wc-dataextract-env.xml
samples\DataExtract\Catalog\InMemoryPaging\wc-dataextract-catalog-entry.xml

<samples> is the name of the directory where the data extract configuration files are located.


Step 6: Run the utility

You can run the data extract utility from the command line (see Listing 7) to extract the catalog entry records using in-memory paging.

Listing 7. Running the utility
dataextract.bat <WC_toolkit>\samples\DataExtract\Catalog\InMemoryPaging\wc-dataextract.xml

Figure 1 shows the output console after successfully running the data extract utility by performing in-memory paging.

Figure 1. Output messages in the console window
Output messages in the console window

Limitation

The in-memory paging mechanism may result in design complexities. We strongly recommend that you perform this mechanism in cases where there are not too many customizations and to use it with large datasets.


Conclusion

In this article, you learned how to customize the data extract framework by incorporating in-memory paging. By following this approach, you can expect a performance improvement of over eight percent for a dataset with 100K records.


Download

DescriptionNameSize
Code samplecode_snippets.zip14KB

Resources

Learn

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=764731
ArticleTitle=Customizing the WebSphere Commerce data extract framework using in-memory paging
publish-date=10122011