Indexing a massive catalog can take hours to complete because you need to preprocess a large amount of data from the queries used to populate the temporary tables, construct a massive catalog hierarchy, as well as combine this massive data set to populate the index.
The regular process for preprocessing involves preprocessing each temporary table (TI_XXXXXX) sequentially, waiting for the current table to finish preprocessing before moving onto the next table. This process can be inefficient if your environment can handle parallel processing (ex. multi-core CPU) as most tables needed for preprocessing are independent of each other, so you don't always need data in TI_XXXXXX to populate TI_YYYYYY.
To take advantage of parallel preprocessing and distributed indexing for your index, you will need to split the index into shards. Sharding the index is the process of splitting up the index into individual shards for processing. There are two types of shards: horizontal and vertical shards.
Horizontal shards are shards that split up the index by catentry ID. For example, Horizontal Shard A will process catentry ID 1 - 500,000, Horizontal Shard B will process catentry ID 500,001 - 1,000,000, etc. Each horizontal shard will perform preprocessing and indexing only for its set of catentries. Once all the horizontal shards have completed indexing, these indexes will be merged to make up a single index.
Vertical shards are shards that will perform preprocessing that require all catentries to be processed at the same time. For example, catalog hierarchy processing requires all catentries to be processed at the same time to make sure the hierarchy is consistent. As a result, catalog hierarchy preprocessing will be performed in a vertical shard. Vertical shards can also be configured for preprocessing customizations that require all catentries to be processed at the same time. Once the vertical shards have completed preprocessing, this data will be combined with each individual horizontal shard for indexing.
Below is an example configuration of the horizontal and vertical shards:
With Sharding, we can take advantage of parallel processing by performing an intensive process like catalog hierarchy preprocessing while perform other index preprocessing processes at the same time, significantly reducing the amount of time it takes for indexing.
To take advantage of Sharding on FEP7, you will need to install iFix JR50129. This fix is already included in FEP8 so you don't need this iFix if you are on FEP8. For more information on parallel preprocessing and distributed indexing, you can review the following Knowledge Center document regarding this functionality: http://www-01.ibm.com/support/knowledgecenter/SSZLC2_7.0.0/com.ibm.commerce.developer.doc/concepts/csdsearchparallel.htm?lang=en