Problems and solutions in IBM Content Analytics with Enterprise Search Version 3.0

Release Notes

Abstract

This document describes known issues and workarounds for IBM Content Analytics with Enterprise Search Version 3.0, including Version 3.0 fix packs.

Content

The information is organized according to information type.

Requirements
Deprecated functions
Installation
Administration
Integration
Application development
User applications
Documentation updates
Content Analytics Studio

Requirements

For the most current information about requirements, including requirements for IBM Content Analytics Studio, see System requirements and Supported data sources.

Deprecated functions

This section describes functions that are deprecated in Version 3.0.

SIAPI APIs and Web Services APIs
The SIAPI Administration APIs and Web Services are no longer supported. The deprecation of these APIs was previously announced. To develop custom administration applications, use the provided REST APIs.

The SIAPI Search APIs are being deprecated and will not be supported in future releases. To develop custom applications, use the provided REST APIs.

Information about using the REST APIs is available in the ES_INSTALL_ROOT/docs/api/rest directory. Sample scenarios for using the APIs are available in the ES_INSTALL_ROOT/samples/rest directory.

Installation

This section describes known issues and workarounds for installing the product.

Upgrading to Version 3.0
For complete information about upgrading to Version 3.0, including information about data that is or is not migrated automatically, see Upgrading to IBM Content Analytics with Enterprise Search Version 3.0.

Attention: If you did not use the default installation paths for your IBM Content Analytics V2.2 or OmniFind Enterprise Edition V9.1 installation, see the following critical information before you begin the upgrade process: How to avoid potential loss of previous configuration when upgrading to a new version of Content Analytics with Enterprise Search or OmniFind Enterprise Edition.

Console installation
It might take a few minutes for the installation program to finish installing the software after you press Enter at the end of the installation prompts. Do not end the program or suspend it until user control is returned so that additional commands can be entered. Note that installation by the console method does not require the use of a web browser.

Installing the system to use WebSphere Application Server Network Deployment

Base release of IBM Content Analytics with Enterprise Search Version 3.0:
If you plan to use WebSphere Application Server Network Deployment as the application server, you must run the following command to install the product and run applications in a non-cluster environment. Neither vertical clustering nor horizontal clustering is supported in the base release of IBM Content Analytics with Enterprise Search Version 3.0.

AIX or Linux: ./install.bin -D\$WAS_CLUSTERING_ENABLED\$=false
Windows: install.exe -D$WAS_CLUSTERING_ENABLED$=false

IBM Content Analytics with Enterprise Search Version 3.0 Fix Pack 1:
Beginning with this fix pack, cluster deployment for horizontal clustering is supported, where each clustering node is also configured as a search server. If you specify WebSphere Application Server Network Deployment as the application server when you run the installation program, the applications are deployed on a cluster server by default. If you do not want to install the applications on a cluster server, use the preceding command to run the IBM Content Analytics with Enterprise Search installation program.

Limitations:.
   - The backend processes of Content Analytics with Enterprise Search do not run on WebSphere Application Server.
   - The processes do not utilize the clustering features of Network Deployment, such as Balancing workloads, High availability, or Deployment managers.
   - The decision to use the clustering mode of WebSphere Application Server Network Deployment must be specified when you install the base release of IBM Content Analytics with Enterprise Search Version 3.0 and cannot be changed when you install Fix Pack 1.
   - If you install IBM Content Analytics with Enterprise Search as a distributed server system (not all-on-one), the Search and Analytics customizer programs fail with a "not found" error. A workaround is available for this problem. See http://www.ibm.com/support/docview.wss?uid=swg21606987.

High Availability installation with Microsoft Cluster Service (MSCS)
This procedure assumes that MSCS is set up correctly on a Microsoft 2008 R2 64-bit cluster server:

1. Shut down cluster 2.
2. Install IBM Content Analytics with Enterprise Search on cluster 1:
   - ES_INSTALL_ROOT should be local storage.
   - ES_NODE_ROOT should be a shared resource that is managed by MSCS.

3. Shut down cluster 1.
4. Start cluster 2.
5. Install IBM Content Analytics with Enterprise Search on cluster 2:
   - ES_INSTALL_ROOT should be local storage.
   - ES_NODE_ROOT should be a shared resource that is managed by MSCS.
   - When the installation program prompts you to overwrite the data directory,
     click Ignore and allow data to be installed in the same directory.

6. Shut down cluster2.
7. Start cluster 1 and wait a few minutes.
8. Start cluster 2.
9. Run the esharesource.vbs script on cluster 1 to create the "IBM OmniFind Enterprise Edition" service in MSCS.
10. Run the esharesource.vbs script on cluster 2 to add cluster 2 to the owner list of the "IBM OmniFind Enterprise Edition" service.
11. On cluster 1, start IBM Content Analytics with Enterprise Search (by using the First Steps program or by entering the command esadmin system startall).
11. From cluster 1, log in to the IBM Content Analytics with Enterprise Search administration console.
12. On the System tab, add cluster 2 as a backup server in the system topology.

Log file for First Steps
The First Steps program instructs you to see the startstatus.log file for details about out of memory errors. The errors are actually written to the ccl.log file.

Silent uninstallation with WebSphere Application Server
If you use the silent method to remove the IBM Content Analytics with Enterprise Search software and security is enabled in WebSphere Application Server, you must specify the following options for the WebSphere Application Server administrative user ID and password in the response file:

WAS_USER_NAME=<value>
WAS_USER_PASSWORD=<value>
WAS_USER_PASSWORD_CONF=<value>

Administration

Rules to expand queries and rank documents

The ability to upload custom analyzers and associate analyzers with index fields is disabled by default. To learn how to enable these functions, and to learn about configuring custom rules to automatically expand queries, see Expanding queries and influencing how documents are ranked in the results.

Exporting the LTPA key file
You cannot export an LTPA key with the correct format from the Configure Application Login Settings page of the administration console by clicking the Export button. This means that you cannot import the key file for use on another system, such as WebSphere Application Server.

To configure support for single sign-on (SSO) authentication with other systems, you must export the LTPA key file from the other system, such as WebSphere Application Server, and then import the key into IBM Content Analytics with Enterprise Search by clicking the Import button on the Configure Application Login Settings page. If you need to configure SSO with a system that cannot export the LTPA key (such as Lotus Domino), contact IBM Software Support for assistance.

Internal Server Error with WebSphere Application Server Network Deployment
WebSphere Application Server can be configured with a timeout value each application. The default value in WebSphere Application Server is 0 (infinite). In WebSphere Application Server Network Deployment, however, the default timeout value is 60 seconds. With heavy load processing, this timeout configuration can cause Internal Server Errors in the IBM Content Analytics with Enterprise Search administration console.

To extend the timeout value:

1. Edit the IHS_DIR/Plugins/config/webserver1/plugin-cfg.xml file and set the ServerIOTimeout attribute for ESSearchServer to 0. For example:

<Server ConnectTimeout="5" ExtendedHandshake="false" MaxConnections="-1"
Name="exampleNode01_ESSearchServer" ServerIOTimeout="0" WaitForContinue="false">
<Transport Hostname="example.server.com" Port="9081" Protocol="http"/>

2. Save your changes and either restart IBM HTTP Server and WebSphere Application Server Network Deployment, or enter these commands to restart the IBM Content Analytics with Enterprise Search system:
   esadmin system stopall
   esadmin system startall

Text analytics in enterprise search collections
Enterprise search collections include some text analytics capabilities, such as Named Entity Recognition. However, linguistic analysis for enterprise search collections and content analytics collections are optimized for use by enterprise search applications and the content analytics miner. Therefore, the results of analysis are not always identical for different types of collections. For example, the parts of speech are analyzed in content analytics collections, but linguistic analysis is minimized in enterprise search collections for better document processing performance.

Exporting and importing crawler configurations
Some restrictions apply to the ability to export and import crawlers from other systems. For example, the import function might require the es.cfg file that existed on the old system for some crawlers. Preserve your old system until the import task has completed successfully and you have verified that the imported crawlers are working as expected.

Exporting crawlers:
   Command: esadmin export -cid collection_id

   where:
   collection_id
      Specifies the collection ID for the collection to be exported.
      The exported zip file is created under the ES_NODE_ROOT/dump directory by default.

Importing crawlers:
   Command: esadmin import -crawlers -cid collection_id -fname file_name_to_be_imported -escfg escfg_file_path

   where:
   collection_id
      Specifies the collection ID of the collection to which you want to import crawler configurations.
   file_name_to_be_imported
      Specifies the path to the file that you want to import. If the file name is not absolute,
      then the ES_NODE_ROOT/dump directory is assumed.
   escfg_file_path
      Specifies the es.cfg file from the old system. Always specify the absolute path of the es.cfg file.

Limitations:
If you import crawlers that you exported from OmniFind Enterprise Edition V9.1 or IBM Content Analytics V2.2:

Exporting a SharePoint crawler that has SSL keystore settings is not supported.
When a Notes crawler to be imported has the advanced DIIOP option setting, you must specify the path for the V9.1 or V2.2 es.cfg file.

To work around these limitations, you can remove the unsupported settings on the previous system and reconfigure them after the import task is complete.

If you import crawlers that you exported from OmniFind Enterprise Edition V8.5:

Importing a Seed list crawler is not supported.
When a Notes crawler to be imported has the advanced DIIOP option setting, you must specify the path for the V8.5 es.cfg file.
After importing crawlers, you must set the connection credentials for each crawler manually. The connection credentials are created during the import task, but they are not associated with the imported crawlers. These credentials are orphaned and cannot be deleted if you remove one of these crawlers without doing this manual step.

Owner names for category rules

When you configured rules for categories in previous versions, the User column of the Rules page showed the user name of the application user who created the rule. If an administrator created the rule, the User column was blank.

When you create rules in Version 3.0, however, the user name of the rule creator is displayed regardless of the user role. Thus, you might see blank entries in the User column for some rules (rules that were created by an administrator and migrated) and see the administrator ID in the User column for new rules.

Adjusting the Java heap size for the Case Manager and FileNet P8 crawlers
When the Case Manager crawler or FileNet P8 crawler is used to crawl a large number of folders, the Java heap size must be adjusted to avoid out of memory errors. For FileNet P8, this requirement applies when the crawler is configured to crawl folder spaces, not class spaces.

To adjust the Java heap size:
1. Stop the IBM Content Analytics with Enterprise Search system: esadmin stop stopccl(.sh)

2. Edit the ES_NODE_ROOT/master_config/collection-ID_config.ini file. Find the crawler sessions and change the max_heap parameter from 512 to 1024.

For example:
session1.collectionid=col_27716
session1.configDir=col_27716.ADAPTER_85355
session1.description=
session1.displayname=Case Manager crawler 1
session1.domain=.
session1.flags=0
session1.id=col_27716.ADAPTER_85355
session1.init_heap=16
session1.max_heap=1024
session1.nodeid=node1
session1.sectiontype=session
session1.subtype=ADAPTER
session1.type=crawler

3. Restart the IBM Content Analytics with Enterprise Search system: esadmin start startccl(.sh)

Deleting a collection that is used for exported flagged documents
The export of flagged documents from an enterprise search application to a content analytics collection appears to remain in a waiting state. If you attempt to delete the collection, errors occur and the collection is not deleted.

To avoid this situation, disable the export of flagged documents to a content analytics collection before you delete the collection. If you experience this situation, the export settings are disabled during the first attempt to delete the collection. Click Delete again to successfully delete the collection.

Links to error messages are broken
In the administration console, a More Information icon is displayed for many error messages. Typically, if you click the icon, an explanation of the problem with a suggested corrective action is displayed. The links from this icon currently do not work. To see details about the error, locate the message number in the Messages section of the information center.

This problem was resolved in IBM Content Analytics with Enterprise Search Version 3.0 Fix Pack 1.

Changes to the SharePoint crawler in Fix Pack 1
With IBM Content Analytics with Enterprise Search Version 3.0 Fix Pack 1, changes were made to how the SharePoint crawler accesses the ACL for pre-filtering documents on SharePoint Server 2007 and 2010 servers. The SharePoint crawler now uses the document-level ACL in the pre-filtering stage instead of the list-level ACL. When re-crawling and re-indexing SharePoint documents, the ACL will be replaced by the document-level ACL automatically, even if the SharePoint crawler was created before the fix pack was applied.

If post-filtering is enabled, the SharePoint crawler verifies the search user's ACL in real time. There should be no change in secure search results regardless of whether the document-level ACL or list-level ACL is crawled when post-filtering is enabled.

If you want to return to the pre-fix pack behavior and use the list-level ACL, change the crawler configuration file, ES_NODE_ROOT/master_config/<collection_id>.<crawler_id>/crawler_config.xml, as follows:

- If the crawler was created before you applied the fix pack, copy the following configuration element under the "crawler" element:
<property name="sp:crawl_document_acl" type="boolean" options="empty_if_missing_at_runtime">false</property>

- If the crawler was created after you applied the fix pack, the "sp:crawl_document_acl" configuration element already exists, so modify this value to false.

Integration

Running the escrbi.sh script for BigInsights integration

If you install IBM InfoSphere BigInsights Version 1.3 Fix Pack 1 or later, you must run the escrbi.sh script, which is provided with IBM Content Analytics with Enterprise Search Version 3.0 Fix Pack 1. The script updates the classpaths for sessions that require Hadoop libraries from the BigInsights server.

1. Run the following script: ES_INSTALL_ROOT/bin/escrbi.sh
2. Stop the Content Analytics with Enterprise Search system: esadmin system stopall
3. Restart the Content Analytics with Enterprise Search system: esadmin system startall

Adding search servers to a collection managed on a BigInsights server
When you add an additional search server to the system topology for use with a collection that is managed on an IBM InfoSphere BigInsights server, you must rebuild the optional facet index if the BigInsights index was created with partitions.

Indexing appears stalled for a collection managed on a BigInsights server
If indexing activity appears stalled in the administration console, you need to use the BigInsights administration console to check the document processing and indexing status. Until indexing is complete, the status is not relayed to the IBM Content Analytics with Enterprise Search administration console from the BigInsights server.

Manually stopping processes on a BigInsights server
IBM Content Analytics with Enterprise Search uses an orchestrator to manage Jaql processes. When parsing and indexing is stopped, some processes on a BigInsights server might continue to run. These BigInsights processes are orphaned. You must manually stop these processes from the BigInsights administration console or by entering the hadoop command at a command prompt before restarting an index build.

Some time consuming processes contain the collection ID as a part of the process name, such as "Parser Tokenizer collection ID". Use the process names to help identify which processes need to be manually stopped.

Importing CSV files
Data is imported successfully if the content follows the CSV file format regardless of the file extension (.csv, .dat, .text, .txt, and so on). When you run the CSV file import wizard, you can verify that the format of the data is correct by previewing the content before you import it.

Application development

This section describes known issues and workarounds for using the product's application programming interfaces.

Using /rangeFacet?method=addRangeDef
The rangeName and the parentFacetName attributes cannot begin with a number and they cannot have a space. To use these attributes, ensure that they begin with a letter and that they do not contain any spaces.

User applications

This section describes known issues and workarounds for using the user applications that are provided with the product.

Running applications in bidirectional languages
If you set your browser language to a bidirectional language such as Arabic or Hebrew, and open the enterprise search application or content analytics miner, some page controls and text are not rendered correctly as bidirectional output. These browser limitations are beyond the control of IBM Content Analytics with Enterprise Search.

In the Trends view and Deviations view, the y-axis of the trends index or deviations index is not always displayed.
In the Time series view, the date format on the x-axis might not be correct. For example, the date is shown as YYYYMMDD instead of DDMMYYYY.
Pressing the Home or End key does not always move the cursor to the expected position. For example, if you save a query and then press Home or End in the Query box, the cursor does not move to the start of the query terms.
In the Trends view, the cursor does not behave as expected in the Filter text box. For example, pressing the arrow keys moves the cursor to a direction opposite from what you expect.

Interpreting the Date facet in the facet tree
The format of the Date facet in facet tree is not intuitive. For example, users might think that the Year element is repeated instead of realizing that the final element represents the Hour. The date facet includes the following levels: Year, Month, Day, and Hour.

Script error when launching the content analytics miner in Firefox
When you start the content analytics miner, you might see an error message that states "Warning: Unresponsive script," indicating that a script is busy or unresponsive. To get past this dialog, click Continue.

The Firefox browser issues this warning when it detects that currently running JavaScript is taking too long and might have timed out or entered an infinite loop. To disable this warning, you can reconfigure how long Firefox waits for a script to run before determining that it has become unresponsive:

1. Type about:config in the Firefox address bar.
2. Filter down to the values for dom.max_script_run_time and dom.max_chrome_script_run_time.
3. Change the values to something higher than the default (which is 10 seconds).

Setting the value to 0 allows the script to run for as long as it needs. Specifying a large number might be more appropriate, however, to avoid locking the user interface forever as the script runs.

Documentation

For information about changes and corrections to the IBM Content Analytics with Enterprise Search Version 3.0 information center, see the Documentation Updates for Version 3.0. You can also see the documentation updates when you browse comments in IBM Knowledge Center.

Content Analytics Studio

For installation instructions and usage guidelines, or for information about known issues and workarounds for IBM Content Analytics Studio, see the Release Notes for the IBM Content Analytics Studio.

Related Information

IBM Knowledge Center

Download Version 3.0 Fix Pack 4

[{"Product":{"code":"SS5RWK","label":"Content Analytics with Enterprise Search"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"--","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF033","label":"Windows"}],"Version":"3.0","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Tips