Rebuilding the data set index

You can rebuild the data set index to ensure that all data sets are displayed in the InfoSphere® Information Analyzer thin client.

Before you begin

You must have the Suite Administrator role to complete this task.

About this task

The data set index is used in the thin client during search operations and in the display of certain attributes like the number of business terms associated with a data set. There are several instances when you will need to rebuild the data set index for the thin client.
  • If there are existing database tables and data files that contain tables in the metadata repository when you install InfoSphere Information Analyzer thin client, you must rebuild the data set index before you can view them as data sets in the thin client. If you are doing a new installation and no metadata has been imported, it is not necessary to rebuild the data set index.
  • If the data set index becomes out-of-sync or corrupted, you must rebuild the data set index. This can happen when there is a loss of communication between one or more suite components or when one or more components are temporarily uninstalled or not working. If you notice that tables in the metadata repository are not visible as data sets in the thin client, your data set index might be out-of-sync or corrupted.
  • If you run analysis on data sets using InfoSphere Information Analyzer workbench, you must rebuild the data set index before you can use that analysis information to search for data sets in the thin client.
If you apply a patch that upgrades your version of the thin client, you must use the upgrade parameter in the reindex REST API method described below to upgrade the index schema.

If your browser is timing out while rebuilding the data set index, you can use a tool that allows you to set a higher time out setting for your browser. For example, you can use the cURL command line tool to increase the maximum wait time for your browser.

Procedure

  1. Close all browser instances.
  2. Open a new supported browser.
  3. Enter the following URL, which corresponds to the reindex REST API method for InfoSphere Information Analyzer: https://server_name:port_number/ibm/iis/dq/da/rest/v1/reindex?batchSize=25&solrBatchSize=100&upgrade=false&force=true
    Note: This command contains default values for the batchSize, solrBatchSize, upgrade, and force parameters. You can use other values that are described in the following table.
    Table 1. Parameters for reindex REST API method.
    Parameter Description Sample value
    server_name The name or IP address of the services tier computer. localhost
    port_number The port number. The default port number for HTTPS is 9443. 9443
    batchSize The batch size to retrieve information from the database. Increasing this size may improve performance but there is a possibility of reindex failure. The default is 25. The maximum value is 10000. 25
    datasetBatchSize The batch size to use when processing data sets. The default is 200. If the total number of data sets is more than 200, then updates are sent to the browser (HTTP client) after each batch is indexed. 200
    solrBatchSize The batch size to use for Solr indexing. Increasing this size might improve performance. The default is 100. The maximum value is 10000. 100
    maxWaitTime The maximum wait time to process a batch of data sets. The default is 240 seconds (4 minutes). This parameter can be increased if the browser timeout setting is higher. Some browsers allow you to extend or disable the timeout setting. 240
    upgrade Specifies whether to upgrade the index schema from a previous version, and is a one time requirement when upgrading from one version of the thin client to another. The schema upgrade can be used to upgrade from any previous version of the thin client. The value true will upgrade the index schema. The value false is the default, and will not upgrade the index schema. false
    datasetType Specifies whether to reindex catalog data sets, workspace data sets or both. Valid values are catalog|, workspace or both. The default value is both. both
    start Specifies if you want indexing to resume from a starting point after a failure. This parameter is only applicable if the datasetType parameter is set to catalog or workspace. Any value between 1 and the corresponding number of data sets.
    force Specifies whether to force reindexing if indexing is already in process. The value true will force a reindex even if indexing is in process. The value false is the default, and prevents a reindex if indexing is already in progress. This option should be used if a previous reindex request is aborted for any reason. For example, if InfoSphere Information Server services tier system went offline, you would use this option. true
  4. Enter your credentials to start rebuilding the index. You will get a message when the reindexing is complete.
  5. Close the browser.