• Share
  • ?
  • Profiles ▼
  • Communities ▼
  • Apps ▼

Blogs

  • My Blogs
  • Public Blogs
  • My Updates
  • Administration
  • Log in to participate

▼ Tags

 

▼ Similar Entries

Improved concurrency...

Blog: Db2 for z/OS ...
Paul_McWilliams 110000JT36
Updated
0 people like thisLikes 0
No CommentsComments 0

Maximo Anywhere 7.6....

Blog: Asset Managem...
PamDenny 270000BXV7
Updated
0 people like thisLikes 0
CommentsComments 1

IBM APM 8.1 - Synthe...

Blog: Application P...
ericmtn 1000009W88
Updated
1 people likes thisLikes 1
No CommentsComments 0

Webcast: Improve Db2...

Blog: Db2 for z/OS ...
Paul_McWilliams 110000JT36
Updated
0 people like thisLikes 0
No CommentsComments 0

Setting Up Index Hea...

Blog: CSE-WebSphere...
Jamie Dishy 50A21QFQ93
Updated
0 people like thisLikes 0
CommentsComments 1

▼ Similar Ideas

Re: 2014 2nd Edition...

Ideation Blog: IBM PureData-...
shubho 270001FMSR
Updated
No Votes 0 No CommentsComments 0

Statistics in Netezz...

Ideation Blog: IBM PureData-...
DeepashriKrishnaraja 270001C7Y3
Updated
Votes 1 CommentsComments 3

Importance of settin...

Ideation Blog: IBM PureData-...
DeepashriKrishnaraja 270001C7Y3
Updated
Votes 2 CommentsComments 5

Understanding Netezz...

Ideation Blog: IBM PureData-...
vinoy 270001RPDP
Updated
Votes 2 CommentsComments 2

▼ Archive

  • March 2015
  • February 2015
  • January 2015
  • November 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • January 2014
  • November 2013
  • September 2013
  • July 2013
  • May 2013
  • April 2013

▼ Blog Authors

WebSphere Commerce Search Cookbook

View All Entries
Clicking the button causes a full page refresh. The user could go to the "Entry list" region to view the new content.) Entry list

Sharding: Parallel PreProcessing and Distributed Indexing (FEP7+)

Eric-Scott 270006G0Q2 | | Tags:  indexing preprocess performance index sharding ‎ | 7,919 Views

Indexing a massive catalog can take hours to complete because you need to preprocess a large amount of data from the queries used to populate the temporary tables, construct a massive catalog hierarchy, as well as combine this massive data set to populate the index.

The regular process for preprocessing involves preprocessing each temporary table (TI_XXXXXX) sequentially, waiting for the current table to finish preprocessing before moving onto the next table. This process can be inefficient if your environment can handle parallel processing (ex. multi-core CPU) as most tables needed for preprocessing are independent of each other, so you don't always need data in TI_XXXXXX to populate TI_YYYYYY. 

To take advantage of parallel preprocessing and distributed indexing for your index, you will need to split the index into shards. Sharding the index is the process of splitting up the index into individual shards for processing. There are two types of shards: horizontal and vertical shards.

Horizontal shards are shards that split up the index by catentry ID. For example, Horizontal Shard A will process catentry ID 1 - 500,000, Horizontal Shard B will process catentry ID 500,001 - 1,000,000, etc. Each horizontal shard will perform preprocessing and indexing only for its set of catentries. Once all the horizontal shards have completed indexing, these indexes will be merged to make up a single index.

Vertical shards are shards that will perform preprocessing that require all catentries to be processed at the same time. For example, catalog hierarchy processing requires all catentries to be processed at the same time to make sure the hierarchy is consistent. As a result, catalog hierarchy preprocessing will be performed in a vertical shard. Vertical shards can also be configured for preprocessing customizations that require all catentries to be processed at the same time. Once the vertical shards have completed preprocessing, this data will be combined with each individual horizontal shard for indexing.

Below is an example configuration of the horizontal and vertical shards:

image

 

With Sharding, we can take advantage of parallel processing by performing an intensive process like catalog hierarchy preprocessing while perform other index preprocessing processes at the same time, significantly reducing the amount of time it takes for indexing.

To take advantage of Sharding on FEP7, you will need to install iFix JR50129. This fix is already included in FEP8 so you don't need this iFix if you are on FEP8. For more information on parallel preprocessing and distributed indexing, you can review the following Knowledge Center document regarding this functionality: http://www-01.ibm.com/support/knowledgecenter/SSZLC2_7.0.0/com.ibm.commerce.developer.doc/concepts/csdsearchparallel.htm?lang=en

  • Add a Comment Add a Comment
  • Edit
  • More Actions v
  • Quarantine this Entry
Notify Other People
notification

Send Email Notification

+

Quarantine this entry

deleteEntry
duplicateEntry

Mark as Duplicate

  • Previous Entry
  • Main
  • Next Entry
Feed for Blog Entries | Feed for Blog Comments | Feed for Comments for this Entry