Contents


IBM Cognos Proven Practices

IBM Cognos Enhanced Search Proven Practice

Nature of Document: Proven Practice; Product(s): IBM Cognos 10.1; Area of Interest: Performance

Comments

Content series:

This content is part # of # in the series: IBM Cognos Proven Practices

Stay tuned for additional content in this series.

This content is part of the series:IBM Cognos Proven Practices

Stay tuned for additional content in this series.

Purpose

The purpose of this document is to provide a best practice for configuring Enhanced Search services, as well as creating and maintaining your search index.

Applicability

This document applies to IBM Cognos Software Version 10.1 on all platforms and may also apply to subsequent releases. The following services are the components of Enhanced Search.

Services

The following services are the components of Enhanced Search.

Index Search Service

The Index Search Service handles search and drill-through (context search) requests, it communicates with the Index Data Service to retrieve results for full-text searches.

An IBM Cognos environment can include multiple instances of the Index Search Service.

Index Update Service

The Index Update Service provides the main content crawling functions. It will collect data for indexing and pass this information to the Index Data Service for storage.

An IBM Cognos environment can include multiple instances of the index update service. Though to start with a single instance should be configured.

Index Data Service

The Index Data Service provides basic full-text functions for storage and retrieval of terms and indexed summary documents. Though to start with a single instance should be configured.

We recommend that you install the index data service in the data tier. Run the Index Data Service under a user who has exclusive access to both the service process and the index files.

Installation

Basic Configuration

The Enhanced Search components (index data service, index update service, and index search service) are deployed under a dispatcher with all of the application tier services of an IBM Cognos installation.

This is the default IBM Cognos Enhanced Search installation and configuration and is useful for a proof of concept installation. In a production environment you may wish to configure a distributed installation. For additional performance, Enhanced Search can be hosted on a dedicated server, where only the search services are enabled.

The index update service and index data service both can consume large amounts of memory when running, it is recommended that they be installed as separate installations. In a 32 bit environment 1.5GB of memory should be enabled per instance.

Distributed Installation

You distribute IBM Cognos Enhanced Search components using the same installation and configuration method that you used to distribute IBM Cognos components. Run the installation on each computer and then complete the configuration by specifying the location of distributed IBM Cognos components, and enabling and disabling required services .

Each IBM Cognos environment can have multiple index data services, multiple index update services, and multiple index files. Initially on a single Index Update and Index Data Service should be enabled. To improve security, we recommend that you install the index data service in the data layer.

For more information on configuring your system, see the section “Enabling and Disabling Index Services in a Distributed Installation” in the Installation and Configuration Guide.

Server Configuration

Dispatcher process check interval

Due to the load on the Batch Report Services, increased validation of processes should be enabled within the dispatchers that run the Batch Report Services.

Edit the following file:

<install>/webapps/p2pd/WEB-INF/p2pd_deploy_defaults.properties

Adding the line:

processCheckInterval=30000

This increases the checking of the BiBusTKServerMain processes, looking for stale or redundant processes. Under heavy indexing load, this could lead to periods of inactivity during indexing when Batch Report Service have failed.

Index Update Service Connections

As index updates add and delete entries only one Index Update Task should be executed at a time. Having the recommended one running instance, and only one connection will prevent multiple indexing tasks from executing at the same time. Launch, IBM Cognos Administration. On the Configuration tab choose Dispatcher and Services. For the running instance of the Index Update Service, set the maximum number of connections for peak and off-peak times to 1. Refer to “Set the Maximum Number of Processes and Connections” in the Administration and Security Guide.

TCP/IP settings for all indexing servers

The operation of the various search services under the platform has a significant impact on the consumption of communication resources. On Microsoft Windows, it is critical to update the “TcpTimedWaitDelay” to a minimum value of 0x1E, which sets the wait time to 30 seconds, and “MaxUserPort” to at least a value of decimal 32768. Consult the relevant documentation for your operating system.

This has no impact on any Cognos services. Making this change is a common practice for Windows servers using Cognos products.

By default, Windows does not allow users to set up client connections on ports above 5000. After a socket is closed, the socket connection stays in a TIME_WAIT state for approximately two minutes more (the amount of time depends on the system configuration). After the waiting period ends, the socket is freed and the address can be reused. Under heavy load in a Cognos Server environment, If more than 4000 connections (ports 1024 through 5000) are made before the ports are freed (after the TIME_WAIT state ends), then attempts to open a client socket on a port above 5000 will be rejected by the operating system.

Creating the Index

Recommended steps to create an effective enhanced search index.

  1. Refine the index scope
  2. Set the indexing languages
  3. Create the initial index
  4. Update the index
  5. Include reporting data in your index

Refine the index scope

By Content Type

Although the default parameter settings for Enhanced Search would allow administrators to build an initial index, it is good practice to verify that all of the content types are required for a particular environment or to meet business requirements. You can exclude all instances of a specific content types from index updates. Launch, IBM Cognos Administration. On the Index Search tab, click Index, then click General. Under Indexable Types, de-select the objects to be excluded from the index. For the initial index, it may be wise to exclude “output” from the first indexing pass if you already have a large number of report outputs.

By Content Location

To avoid exposing objects not intended for the typical user, you may wish to restrict the following types of content; unused/archived content, pre-production content, or specialized content such as system reports.

When creating an Index Update Task, it is possible to add a list of folders to exclude from the index. In the Excluded Content section, click Add. Select the packages and folders to be excluded from the Index Update Task.

Set the indexing languages

If your content store or data is multilingual you should set the indexing languages before creating your first index.

Launch IBM Cognos Administration. On the Index Search tab, click Index, then click General. Under Indexing Locales enter a comma separated list of languages (e.g. en, fr, ja).

Note: Country variants for a language are not supported (e.g. en-us) just the language should be used.

Create the initial index

It is recommended that the search index is built with an account that has access to all content in the Public Folders so that all content will be available to users.

Note: Regardless of the user permissions that built the index, by default search results will only show content that the user performing the search has access to. Refer to “Secure Search Results” in the Administration and Security Guide.

To help ensure that building the initial index has limited impact on users, the common practice is to schedule the first index build process to occur during periods of low reporting usage.

Create a new Index Update Task (refer to “Create an Index Update Task” in the Administration and Security Guide) by default all of the content of Public Folders is included, and you should exclude selected content that you do not wish indexed (see Refining scope “By Content Location”).

You may choose to run the initial index, now or at a later date. For the first execution it is recommended not to set up a repeating schedule for this indexing task.

Select to run the Index Update Task, for the Content Options select only “Properties and metadata”, and for the Scope select “All entries”. Press “Run” to start indexing content.

OptionDescription
Properties and metadataThe properties and metadata of objects within the included content (folders) of the indexing task are indexed. Related objects (e.g report output) of are also indexed. Content types not selected as "Indexable Types", will be ignored.
Only entries that have changedUpdates the existing index. Unchanged content will be retained in the index.
All entriesRebuilds the whole index. All previously indexed content will be deleted.

Update the index

The index is not automatically updated when content changes, such as when a report is authored or when an object is removed from Content Manager. You must update the index to capture all changes.

Based on the expectations of your users you will wish to schedule your Index Update Task to incrementally update the contents of the index. This could be every few hours, nightly or even weekly depending on how often your content changes.

Return to the initial Index Update Task that you created and schedule the task to run at your desired interval (refer to “Schedule Management” in the Administration and Security Guide). For the Content Options select only “Properties and metadata”, and for the Scope select “Only entries that have changed”. An incremental update of the index will take less time and resources than the creation of the initial index.

Note: Multiple Index Update Tasks should not be running at the same time. As incremental indexing can add and delete entries, multiple executing tasks can lead to duplicate or missing entries. If you have only one running instance of the Index Update Service, then multiple tasks will queue until the executing task completes.

Include reporting data in your index

Including reporting data in your index will improve the accuracy of search results, enhance the reports generated for “Create and Explore”, as well as provide metadata-to-data relationships in search results. For example, a search for “Canada” would also return reports that include the metadata Country and Sales Region even if the term Canada was not currently in any of the report outputs.

When indexing data, two options are available.

OptionDescription
Referenced dataSpecifies that only data referenced by the expressions encountered in reports, queries, and analyses that are included in the scope of the indexing task are indexed. Model objects in the selected content are ignored.
All dataSpecifies that all data encountered in the models that are in the scope of the indexing task are indexed. Regardless of whether the metadata has been included in report, query or analysis.

Indexing All data consumes the most resources and can take a considerable amount of time to complete, it is not recommended for large data warehouses.

It is recommend to create multiple Index Update Tasks for related packages and folders. A schedule should be created for each Index Update Task that meets both the business requirements as well as how often the data is updated. For example a PowerCube would only need indexing after updating, whereas a relational database may need indexing on a more regular basis.

Indexing Referenced Data

Create a new Index Update Task, include folders that contain reports, queries or analyses that you wish to collect referenced data for. Under “Content Options” de-select “Properties and metadata”, and select “Data values” making sure that “Referenced data” is selected. Under “Scope” select “Only entries that have changed” (even for the first time that you run the Index Update Task).

Indexing All Data

Create a new Index Update Task, include the packages that you wish to collect all data for. Under “Content Options” de-select “Properties and metadata”, and select “Data values” making sure that “All data” is selected. Under “Scope” select “Only entries that have changed” (even for the first time that you run the Index Update Task).

Note: When data is collected all of the previous data for the package is refreshed in the index.

Excluding datasets

When indexing data, you still may wish to exclude from indexing, certain query items, dimensions or hierarchies from a package. Either for performance reasons if they are very large and contain content of little value (e.g. telephone numbers), or for security reasons.

Launch, IBM Cognos Administration. On the Index Search tab, click Index, then Exclusion. Enter values for the package name that contains the metadata, the type to exclude (e.g. hierarchy or dimension), and the path to the object type in the model.

Back Up and Recovery

It is recommended that you back up the index that is stored on the file system, by default this is the directory <installDir>/indexes/csn. This should be completed at a minimum after long index updates. The index can be restored back to it's original location as long as there is no currently executing Index Update Task. It is not necessary to restart the server after restoring the index.

Tuning and Scaling

Indexing puts significant load on the operation of the Batch Report Service to retrieve both metadata and data. The performance of that service has a significant impact on indexing.

During indexing detailed performance data is generated, in the form of “stat_<date>.html” files generated in the log directory. This information can be used to determine how much time is spent in the various Search services versus time spent in the Batch Report Service. Eventually, multiple instances of the Batch Report Service may need to be deployed for optimal performance.

To manage the additional indexing needs, plan and scale the Batch Report Services appropriately.

The total number of high-affinity connections that are available for indexing must be equal to the number of CPUs that are available on the servers that host the index update service.

Indexing Level

You can set the advanced configuration parameter, CSN.Indexing.Level, to control the CPU and memory use of an indexing job and, thereby, manage the impact that the indexing job has on available resources. If a server is dedicated to running the Indexing Update Service then the value should be set to high.

Go to the Index Search, Index, Advanced page, and set the advanced parameter, to one of the following values (default is Normal).

SettingDescription
high1.5 indexing threads per available processor. Recommended for servers that are dedicated to indexing.
normal1 indexing thread per available processor. Recommended when other applications are running on the same server.
low0.5 indexing threads per available processor. Recommended when low system usage is required.

Note: If you select “high” you will need to make sure that there are enough high affinity connections available on the Batch Report Services.

Index Sharing

To scale search and indexing operations, you can deploy multiple instances of the index data service to different servers. Because searching is CPU-bound, you can achieve load balancing by introducing new servers that share the same index. This configuration is known as index sharing.

Index sharing allows multiple index data services to search and update a single index that is located on the shared file system within the distributed IBM Cognos BI environment. All index data services can search all index files.

For more information on Index Sharing see “Scaling Index Search by Using Index Sharing” in the IBM Cognos Installation and Configuration Guide.

Logging

Enhanced Search provides a sample IPF file called ipfcsnclientconfig.xml (located in the configuration folder) which by default produces some basic logging. Extra logging can be turned on by setting the logging levels within the file to "debug". Rename the file to ipfclientconfig.xml to enable logging. No server restart is required. Three log files are generated.

  • csn.log - main logging file
  • csnSearchSummary.log - summary of search requests
  • csnIndexing.log - detailed information for indexing content.

Appendix A

Further Reading

Administration and Security Guide, Chapter 30: Managing Index Search

Installation and Configuration Guide, Chapter 11: Configuration Options, Configuring IBM Cognos Index Search

User Guide, Chapter 3: IBM Cognos Connection, Search for an Entry


Downloadable resources


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Data and analytics, Information Management
ArticleID=591901
ArticleTitle=IBM Cognos Proven Practices: IBM Cognos Enhanced Search Proven Practice
publish-date=05022011