Crawler setup requirements to support security
To gather information that enables document-level security to be enforced, the crawlers must have permission to read access control data on the target data source so that security data can be indexed with the documents.
All crawlers can be configured to associate documents with security tokens so that access control data can be stored with the documents in the index. For some crawlers and data source types, additional steps are required to configure a secure environment, especially for sources that support the ability to validate a user's credentials with current access control data that is maintained by the data source.
Crawler | Requirements |
---|---|
Agent for Windows file systems crawlers | |
The agent server and the file servers to be crawled must all belong to the same Windows domain or workgroup. The file systems to be crawled must allow the crawler server to read data. | |
BoardReader crawlers | |
Before you configure the crawler, you must obtain a license key from BoardReader. The crawler uses this key to access the social media sources that the license agreement allows you to access. | |
Case Manager crawlers | |
Before you configure the crawler, you must install
the IBM® FileNet® Content Engine Client on
the crawler server. You must also run a script (escrfilenet.sh on AIX® or Linux®, or escrfilenet.vbs on Windows) to configure the crawler server. |
Related topics: |
Content Integrator crawlers | |
Before you create the crawler, run a script
(escrvbr.sh on AIX or Linux, or escrvbr.vbs on Windows) to configure the crawler
server. When you configure the crawler, specify a user ID and password that enables the crawler to access each repository to be crawled. You can specify a different user ID and password, as necessary, for each repository in the crawl space. |
Related topics: |
DB2 crawlers | |
Before you create the crawler, run a script
(escrdb2.sh on AIX or Linux, or escrdb2.vbs on Windows) to configure the crawler
server. When you configure the crawler to crawl remote, uncataloged databases, specify a user ID and password that enables each database on the target database server to be crawled. You can specify a different user ID and password, as necessary, for each database in the crawl space. |
Related topics: |
Content Manager crawlers | |
Before you create the crawler, run a script
(escrcm.sh on AIX or Linux, or escrcm.vbs on Windows) to configure the crawler
server. When you configure the crawler, specify a user ID and password that enables the crawler to access each server to be crawled. You can specify a different user ID and password, as necessary, for each server in the crawl space. |
Related topics: |
Notes and Quickr for Domino crawlers | |
To crawl Lotus® Domino® servers that use the Notes® remote procedure call (NRPC)
protocol:
To crawl Quickr for Domino servers, you must configure the server to support Local User security or Directory Assistance, depending on the type of security you want to use. |
Related topics:
|
Exchange Server crawlers | |
When you configure the crawler, specify a user
ID that is authorized to access public folders on the Exchange Server
to be crawled and the password for this user ID. For the crawler to use Exchange Server key management and the Secure Sockets Layer (SSL) protocol when crawling data, also specify the fully qualified path to the keystore file and a password that enables the crawler to access this file. The keystore file must exist on the crawler server. |
|
FileNet P8 crawlers | |
Before you configure the crawler, you must install
the IBM FileNet Content Engine Client on
the crawler server. You must also run a script (escrfilenet.sh on AIX or Linux, or escrfilenet.vbs on Windows) to configure the crawler server. |
Related topics: |
JDBC database crawlers | |
When you configure the crawler, you can specify a user ID and password that enables tables in the target database to be crawled. You can specify a different user ID and password, as necessary, for each database in the crawl space. | |
NNTP crawlers | |
The NNTP servers to be crawled must allow the crawler server to read data. | |
SharePoint crawlers | |
Before you configure the crawler, you must deploy provided web services on the SharePoint server. | Related topic: |
UNIX file system crawlers | |
The AIX and Linux subdirectories to be crawled must allow the crawler server to read data. | |
Web crawlers | |
The Web crawler abides by the robots exclusion
protocol. If a Web server includes a robots.txt file
in the top level of the server directory, the crawler analyzes the
file, and crawls Web sites on that server only if it is allowed to
do so. For information about this protocol, see http://www.robotstxt.org/wc/exclusion.html. When
you configure the Web crawler:
|
Related topics: |
Seed list crawlers | |
When you configure the crawler to crawl sources on a WebSphere Portal server, specify a fully qualified distinguished name (DN) that enables the crawler to retrieve pages from the server to be crawled, such as uid=admin,cn=RegularEmployees,ou=Software Group,o=IBM,c=US, and specify the password for this DN. The DN must match a DN that is configured in WebSphere Portal. |
|
Windows file system crawlers | |
The subdirectories to be crawled must allow
the crawler server to read data. When you configure the crawler to
crawl remote file systems, specify a user ID that enables the crawler
to access the remote data and specify a password for this user ID. To validate current user credentials when a user submits a search request, ensure that domain accounts are correctly configured. Requirements for setting up domain accounts for files that were crawled on the local computer are different from requirements for files that were crawled on a remote Windows server. |
Related topics: |