Rebuilding the Solr index (Watson Knowledge Catalog)

You can rebuild the index to ensure that information assets are updated.

Before you begin

Important: You must have the Administrator role to complete this task.
After upgrading to version 4.6.1 or later, run the following steps:
  1. Check the log of the Solr pod by running:
    oc logs solr-0  -n <Namespace>  | grep "Exception writing document id"
  2. Check to see if there is a SolrException error in the log. For example:
    2022-12-13 09:56:46.434 ERROR (qtp1279740095-23) [c:da-datasets s:shard2 r:core_node4 x:da-datasets_shard2_replica_n2] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Exception writing document id c2e76d84.23d97ff8.apk82sdmn.ugp40kf.qpt2a9.6ocjq9ppatcjf5q38qd4r to the index; possible analysis error: cannot change field "hostName" from doc values type=NONE to inconsistent doc values type=SORTED => org.apache.solr.common.SolrException: Exception writing document id c2e76d84.23d97ff8.apk82sdmn.ugp40kf.qpt2a9.6ocjq9ppatcjf5q38qd4r to the index; possible analysis error: cannot change field "hostName" from doc values type=NONE to inconsistent doc values type=SORTED
  3. If there is an error, restart Solr before you rebuild the index by running:
    kubectl scale --replicas=0 StatefulSets solr
    
    kubectl scale --replicas=1 StatefulSets solr

About this task

If your browser times out while rebuilding the index, you can set a higher time-out setting for your browser. For example, you can use the cURL command line tool to increase the maximum wait time for your browser.

Performance recommendations
The reindex operation runs queries on the XMETA database and sends results to the Solr microservice. Therefore, the performance of the reindex operation depends on factors like computing capacity (like the number of CPUs, speed, availability) of iis-services pod and XMETA database, available memory, read and write speed, network speed, or Solr JVM settings.
For the optimal performance of the reindex operation, complete the following steps:
  • When the XMETA database is busy, don't run reindex.
  • If you have many assets, like hundreds of thousands or more, update XMETA statistics by running the runstats command on all tables in the XMETA schema. For more information, see step 5.
  • By default, the Solr JVM has 1 GB of memory assigned. Ensure that this default is not changed to anything lower than 1 GB. To verify or update the memory assigned to the Solr JVM, follow these steps:
    1. Edit the iis-cr by running the following command:
      oc edit iis iis-cr -n wkc
    2. Locate the solr_max_heap_size variable. It should be in the spec: section. If it's not specified, you can add it. The section should then look like this example:
      spec:
      blockStorageClass: managed-nfs-storage
      fileStorageClass: managed-nfs-storage
      ignoreForMaintenance: false
      license:
        accept: true
        license: Standard
      scaleConfig: small
      storageClass: managed-nfs-storage
      use_dynamic_provisioning: true
      version: 4.5.0
      solr_max_heap_size: 1024
      Adjust the value of the solr_max_heap_size attribute as needed.
    3. Save and exit.

Procedure

  1. Close all browser instances.
  2. Open a new supported browser.
  3. Enter the following URL, which corresponds to the reindex REST API method:
    https://hostname/ibm/iis/common-utils/rest/v1/app/reindex?batchSize=100&solrBatchSize=100
    Notes:
    • This command contains default values for the batchSize and solrBatchSize parameters. You can use other values that are described in the following table.
    • The duration of the reindexing process depends on the amount of metadata on your cluster.
    • If you are currently running tasks that affect product performance, for example, running analysis, or importing data, consider running the reindex task later. Reindexing greatly affects performance, and running it at a time when the system is already busy might cause reindexing to fail.
    Table 1. Parameters for reindex REST API method.
    Parameter Description
    hostname The hostname of the IBM Cloud Pak® for Data cluster.
    batchSize The batch size to retrieve information from the database. Increasing this size may improve performance but there is a possibility of reindex failure. The default is 100. The maximum value is 10000.
    solrBatchSize The batch size to use for Solr indexing. Increasing this size might improve performance. The default is 100. The maximum value is 10000.
    maxWaitTime The maximum wait time to process a batch of assets or data sets. The default is 240 seconds (4 minutes). This parameter can be increased if the browser timeout setting is higher. Some browsers allow you to extend or disable the timeout setting.
    assetType Specifies one or more comma separated asset types, for example, Data Connection, Database Column .

    Use the updateIndex parameter and specify false to list all supported asset types.

    excludeAssetType Specifies one or more comma separated asset types to exclude.
    start Specifies if you want indexing to resume from a starting point after a failure. This parameter is applicable to any single asset type.
    updateIndex When set to false, this parameter lists supported asset types and their counts. This parameter is only supported when using the ibm/iis/common-utils/rest/v1/app/reindex URL. The default is true. When set to false, all other parameters are ignored.
    threadCount Allows you to run a reindexing job in parallel. The default is 4.
  4. Enter your credentials to start rebuilding the index.
    A message is displayed when reindexing is complete. This is the sample output if you specify the default values:
    Mon Jun 15 14:55:41 UTC 2020: Preparing for reindexing.
    Mon Jun 15 14:55:41 UTC 2020: Preparation for reindexing is done.
    Mon Jun 15 14:55:44 UTC 2020: Reindexing assets.
    Mon Jun 15 14:55:44 UTC 2020: Deleted all 'Database' type assets from "da-datasets" (Solr) index.
    Mon Jun 15 14:55:45 UTC 2020: Number of assets of 'Database' type to index is 7.
    Mon Jun 15 14:55:45 UTC 2020: Indexed 7 of 7 'Database' asset(s).
    
    Mon Jun 15 14:55:45 UTC 2020: Reindex summary:
    
    Mon Jun 15 14:55:45 UTC 2020: Indexed 7 of 7 'Database' asset(s).
    Mon Jun 15 14:55:46 UTC 2020: Reindex completed successfully.
  5. Optional: Update the XMETA statistics:
    1. Exec into the iis-db2u pod:
      oc exec -it c-db2oltp-iis-db2u-0 -- ksh
    2. Load the Db2® profile.
      . /home/config/db2inst1/sqllib/db2profile
    3. Connect Db2 to XMETA.
      db2 connect to xmeta
    4. Gather the commands required to update the XMETA statistics. Run the following command on all tables in the XMETA schema:
      db2 -x "SELECT 'runstats on table',substr(rtrim(tabschema)||'.'||rtrim(tabname),1,50),' and indexes all;' FROM SYSCAT.TABLES WHERE (type = 'T') AND (tabschema = 'XMETA')" > runstats_xmeta.out
    5. Run the commands collected in the previous step:
      db2 -tvf runstats_xmeta.out
  6. Close the browser.