This blog promotes knowledge sharing through experience and collaboration. For more product information, visit our WebSphere Commerce CSE page. For easier navigation, utilize the Categories to find posts that match your interest.
FAQ: Commerce Search Index Pre-Processing (di-preprocess)
This post will cover many common questions we get about the manual search index preprocessing command (di-preprocess). If you have any questions regarding search index preprocessing that this FAQ doesn't answer, feel free to comment on this post with your question so we can use such questions to expand our FAQ further.
1. What are the mandatory parameters for running the index preprocessor?
You can perform index preprocessing by going to the WC_installdir/bin directory and running the di-preprocess script with just these parameters:
./di-preprocess.sh WC_installdir/instances/<instance_name>/search/pre-processConfig/MC_<master_catalog_id>/<db_type> -dbuser <dbuser> -dbuserpwd <dbuserpwd> -instance <instance_name>
./di-preprocess.sh /opt/WebSphere/CommerceServer70/instances/demo/search/pre-processConfig/MC_10051/DB2 -dbuser db2inst1 -dbuserpwd wcsr0cks -instance demo
2. How do I perform delta indexing?
To perform delta indexing, you need to run delta index preprocessing and delta buildindex. To run a delta index preprocessing, you need to set fullbuild parameter to false when running the di-preprocess script like so:
./di-preprocess.sh WC_installdir/instances/<instance_name>/search/pre-processConfig/MC_<master_catalog_id>/<db_type> -dbuser <dbuser> -dbuserpwd <dbuserpwd> -instance <instance_name> -fullbuild false
./di-preprocess.sh /opt/WebSphere/CommerceServer70/instances/demo/search/pre-processConfig/MC_10051/DB2 -dbuser db2inst1 -dbuserpwd wcsr0cks -instance demo -fullbuild false
3. Where are the files used for building the temporary tables (TI_XXXXXX)?
We use xml files to create and fetch data for the temporary tables (ex. TI_CATENTRY) located in:
You may notice that this is the first parameter we pass into the preprocess script. This path is used by the script to find the location of the preprocessing xmls to be able to build and populate the temporary tables.
4. How do I increase logging level for preprocessing?
Edit the logging.properties file (WC_installdir/instances/instance_name/xml/config/dataimport/logging.properties) so that:
Note that this will result in much more logging done during preprocessing, so if you need to see what is happening in the middle of preprocessing (rather than at the end when it fails, for example), then you may need to increase the FileHandler limit (file size) or count (number of historical files), for example:
You should collect the data in the following MustGather: http://www-01.ibm.com/support/docview.wss?uid=swg21667775
Once you have collected this data, you should first look into the type of issue you are having by reviewing wc-dataimport-preprocess.log. If there is an issue with a particular temporary table (TI_XXXXXX), you can verify if the data is consistent with what you are expecting to be inside. If the data in the table is consistent, then you should review the corresponding xml file from the pre-processConfig directory used to build this table, to verify that the query used will grab the expected data.
0 Aug 26, 2014 12:07:37 PM com.ibm.commerce.foundation.dataimport.preprocess.DataImportPreProcessorMain main
0 Aug 26, 2014 12:09:18 PM com.ibm.commerce.foundation.dataimport.preprocess.DataImportPreProcessorMain processDataConfig
0 Aug 26, 2014 12:09:18 PM com.ibm.commerce.foundation.dataimport.preprocess.DataImportPreProcessorMain logEndDateAndTime
In FEP7+, we have more indepth logging for each table (with INFO level tracing), showing query/read/write time:
0 Nov 2, 2014 2:16:27 PM com.ibm.commerce.foundation.dataimport.preprocess.DataImportPreProcessorMain execute
To make indexing run as an atomic action, we put lock records into TI_DELTA_CATENTRY/TI_DELTA_CATGROUP. When running preprocess, there will one lock record in these tables:
P (action) = indexing is currently in progress
When preprocess ends, it will one more lock record to indicate that preprocessing has completed and buildindex can be started:
B (action) = index preprocessing completed, buildindex can be ran
When a new index preprocess is started, it will first check TI_DELTA_CATENTRY/TI_DELTA_CATGROUP to see if the table has the P or B action. If it does, then it will cause the new index preprocess to fail, so that it does not interrupt the current preprocess/buildindex process running. If you are sure that there isn't another preprocess/buildindex process running, you can either delete these records from the table, or run preprocess script with -force true parameter.
8. What types of changes can we cover using delta indexing? What types of changes require full indexing?
This will depend on the feature pack that your environment is on as well as the type of update you are doing to the index. The higher the feature pack, the more changes that only require delta indexing to be done. You can review the following Knowledge Center page for a list of type of updates to the search index, as well as if they require a full or delta indexing: http://www-01.ibm.com/support/knowledgecenter/SSZLC2_7.0.0/com.ibm.commerce.developer.doc/refs/rsdsearchindexhints.htm?lang=en
9. Do I need to run preprocess/buildindex if I use the UpdateSearchIndex scheduled job?
No, that is not necessary since the UpdateSearchIndex scheduled job is used to automatic the indexing process by scheduling indexing to run at specific times. However, you can run preprocess/buildindex manually after making changes so you don't need to wait until UpdateSearchIndex runs again to have those changes added to the index. Behind the scenes, UpdateSearchIndex essentially performs preprocess/buildindex in a single process. In the end, an index update either from UpdateSearchIndex or preprocess/buildindex is equivalent so you can choose to use either scenario for updating the index, or a mix of both. For example, you can schedule hourly UpdateSearchIndex runs, while running preprocess/buildindex manually to trigger immediate updates after making a change to the index. For more information about configuring UpdateSearchIndex, you can review the following Knowledge Center page: http://www-01.ibm.com/support/knowledgecenter/SSZLC2_7.0.0/com.ibm.commerce.admin.doc/tasks/tsdschedsearchupdateindex.htm?lang=en
10. How can I change the behaviour of the preprocessing script?
You can add extra parameters to the script to change the behaviour of preprocessing. For example:
(FEP7+) -skipDeltaNoEntry <true/false>: When performing delta preprocessing with this parameter set to true, the script will check if there are any delta updates to perform. If there are no updates to perform, then delta preprocessing will end. This is different than the default behaviour, which will reconstruct all of the temporary tables but they will be empty since there was no delta updates.
(FEP8) -nonLangTables <true/false>: When performing preprocessing with this parameter set to true, only the non-language specific tables will be processed (ex. TI_CATENTRY_0). You can quickly identify these tables as only having one number appended to the name, which is used to identify the index they are associated with (ex. MC_10001 = _0, MC_10051 = _1, etc...)
(FEP8) -langTables <true/false>: When performing preprocessing with this parameter set to true, only the language specific tables will be processed (ex. TI_ATTR_0_1). You can quickly identify these tables as having two numbers appended to the name. The first number is used to identify the index they are associated with (ex. MC_10001 = _0, MC_10051 = _1, etc...). The second number is used to identify the language ID (en_US = -1 which turns into _1).
(FEP8) -deepSequence <true/false>: When performing preprocessing with this parameter set to true, products will be sequenced in the index based on the deep search sequencing functionality. By default, when using category navigation, only the category's products are displayed. However, with deep search, all of the subcategories' products will also be displayed. If these products have sequence values, then we will need to process sequencing differently to account for deep search, which is what deep search sequencing is for. If you are using deep search, as well as sequencing for your products, then you can use this parameter to enable deep search sequencing. However, if you don't use deep search or sequencing, then this functionality isn't applicable to you.
(FEP8) -deepUnpublish <true/false>: When performing preprocessing with this parameter set to true, preprocessing will be performed based on the deep category unpublish functionality. The deep category unpublish feature allows immediate child categories and all of their underlying subcategories and products to be hidden from shoppers in the storefront. If you are using this functionality, then this parameter will prevent these products/categories from being indexed as published.
(FEP8) -publishedOnly <true/false>: When performing preprocessing with this parameter set to true, preprocessing will be performed to allow only products from published categories to be indexed when deep category unpublish is enabled.