Crawler setup requirements to support security

To gather information that enables document-level security to be enforced, the crawlers must have permission to read access control data on the target data source so that security data can be indexed with the documents.

All crawlers can be configured to associate documents with security tokens so that access control data can be stored with the documents in the index. For some crawlers and data source types, additional steps are required to configure a secure environment, especially for sources that support the ability to validate a user's credentials with current access control data that is maintained by the data source.

Table 1. Crawler setup requirements to support security
Crawler Requirements
Agent for Windows file systems crawlers
The agent server and the file servers to be crawled must all belong to the same Windows domain or workgroup. The file systems to be crawled must allow the crawler server to read data.  
BoardReader crawlers
Before you configure the crawler, you must obtain a license key from BoardReader. The crawler uses this key to access the social media sources that the license agreement allows you to access.  
Case Manager crawlers
Before you configure the crawler, you must install the IBM® FileNet® Content Engine Client on the crawler server.

You must also run a script (escrfilenet.sh on AIX® or Linux®, or escrfilenet.vbs on Windows) to configure the crawler server.

Related topics:
Content Integrator crawlers
Before you create the crawler, run a script (escrvbr.sh on AIX or Linux, or escrvbr.vbs on Windows) to configure the crawler server.

When you configure the crawler, specify a user ID and password that enables the crawler to access each repository to be crawled. You can specify a different user ID and password, as necessary, for each repository in the crawl space.

Related topics:
DB2 crawlers
Before you create the crawler, run a script (escrdb2.sh on AIX or Linux, or escrdb2.vbs on Windows) to configure the crawler server.

When you configure the crawler to crawl remote, uncataloged databases, specify a user ID and password that enables each database on the target database server to be crawled. You can specify a different user ID and password, as necessary, for each database in the crawl space.

Related topics:
Content Manager crawlers
Before you create the crawler, run a script (escrcm.sh on AIX or Linux, or escrcm.vbs on Windows) to configure the crawler server.

When you configure the crawler, specify a user ID and password that enables the crawler to access each server to be crawled. You can specify a different user ID and password, as necessary, for each server in the crawl space.

Related topics:
Notes and Quickr for Domino crawlers
To crawl Lotus® Domino® servers that use the Notes® remote procedure call (NRPC) protocol:
  • On an AIX system, ensure that the I/O Completion Port module is installed and available on the crawler server.
  • Before you create the crawler, run a script (escrnote.sh on AIX or Linux, or escrnote.vbs on Windows) to configure the crawler server.
  • A Domino server must be installed on the crawler server, and this Domino server must be a member of the Domino domain to be crawled.
  • To validate current user credentials when a user submits a search request, the Domino server to be crawled must be configured as a Lotus Domino Trusted Server.
  • When you configure the crawler, specify the path for a Lotus Notes® user ID file that is authorized to access the server, such as c:\Program Files\lotus\notes\data\name.id or /local/notesdata/name.id, and the password for this ID file.
To crawl Lotus Domino servers that use the Domino Internet Inter-ORB Protocol (DIIOP):
  • On an AIX system, ensure that the I/O Completion Port module is installed and available on the crawler server.
  • Configure the crawler server so that it can use the protocol.
  • When you configure the crawler, specify a fully qualified Lotus Notes user ID that is authorized to access the server, such as User Name/Any Town/My Company, and the password for this user ID.

To crawl Quickr for Domino servers, you must configure the server to support Local User security or Directory Assistance, depending on the type of security you want to use.

Related topics:
Exchange Server crawlers
When you configure the crawler, specify a user ID that is authorized to access public folders on the Exchange Server to be crawled and the password for this user ID.

For the crawler to use Exchange Server key management and the Secure Sockets Layer (SSL) protocol when crawling data, also specify the fully qualified path to the keystore file and a password that enables the crawler to access this file. The keystore file must exist on the crawler server.

 
FileNet P8 crawlers
Before you configure the crawler, you must install the IBM FileNet Content Engine Client on the crawler server.

You must also run a script (escrfilenet.sh on AIX or Linux, or escrfilenet.vbs on Windows) to configure the crawler server.

Related topics:
JDBC database crawlers
When you configure the crawler, you can specify a user ID and password that enables tables in the target database to be crawled. You can specify a different user ID and password, as necessary, for each database in the crawl space.  
NNTP crawlers
The NNTP servers to be crawled must allow the crawler server to read data.  
SharePoint crawlers
Before you configure the crawler, you must deploy provided web services on the SharePoint server. Related topic:
UNIX file system crawlers
The AIX and Linux subdirectories to be crawled must allow the crawler server to read data.  
Web crawlers
The Web crawler abides by the robots exclusion protocol. If a Web server includes a robots.txt file in the top level of the server directory, the crawler analyzes the file, and crawls Web sites on that server only if it is allowed to do so. For information about this protocol, see http://www.robotstxt.org/wc/exclusion.html.
When you configure the Web crawler:
  • You must specify a user agent name for the crawler. Rules in the robots.txt files of the servers to be crawled can specify this name to permit or refuse access.
  • Optional: If a Web server uses HTTP basic authentication to restrict access to Web sites, you can specify authentication credentials that enable the Web crawler to access password-protected pages.
  • Optional: If a Web server uses HTML forms to restrict access to Web sites, you can specify authentication credentials that enable the Web crawler to access password-protected pages.
Related topics:
Seed list crawlers

When you configure the crawler to crawl sources on a WebSphere Portal server, specify a fully qualified distinguished name (DN) that enables the crawler to retrieve pages from the server to be crawled, such as uid=admin,cn=RegularEmployees,ou=Software Group,o=IBM,c=US, and specify the password for this DN. The DN must match a DN that is configured in WebSphere Portal.

 
Windows file system crawlers
The subdirectories to be crawled must allow the crawler server to read data. When you configure the crawler to crawl remote file systems, specify a user ID that enables the crawler to access the remote data and specify a password for this user ID.

To validate current user credentials when a user submits a search request, ensure that domain accounts are correctly configured. Requirements for setting up domain accounts for files that were crawled on the local computer are different from requirements for files that were crawled on a remote Windows server.

Related topics: