Pre- and post-filtering of search results

There are two distinct approaches to filtering documents to ensure that search results contain only the documents that the user who submitted the search request is authorized to view.

The first approach is to replicate the document's source access control lists (ACLs) at crawl time into the index and to rely on the search engine to compare user credentials to the replicated document ACLs. Pre-filtering the documents, and controlling which documents are added to the index, results in the best performance. However, it is difficult to model all of the security policies of the various back-end sources in the index and implement comparison logic in a uniform way. This approach is also not as responsive to any changes that might occur in the source ACLs.
The second approach is to post-filter documents in the result set by consulting the back-end sources for current security data. This approach allows the contributing back-end sources to be the final arbiters of the documents returned to the user, and ensures that the result set reflects current access controls. However, this approach results in degraded search performance because it requires that connections exist with all of the back-end sources. If a source is not accessible, then links to documents must be filtered out of the result set along with documents that the user is not authorized to view.

Support for enforcing access controls relies on a combination of these two approaches. The design provides optimum performance while maintaining the precise security policies of the originating document repositories. By storing high-level access control data in the index, the system can provide an interim (potentially smaller) result set which can then be post-filtered to verify current access controls. The assumption is that if the user has access to the repository that owns the document, then chances are that the user also has access to the document.

The access control data that is stored in the index varies with the crawler type. For example, the Notes crawler can store database- and server-level access controls, and the Quickr for Domino crawler can store access controls for servers, places, and rooms.

All crawlers and data source types support the ability to index source access control data during crawl time. Some crawlers and data source types also support the ability to post-filter the result set and verify the user's current credentials. This type of support is provided through data source security mechanisms or the identity management component.

This two-pronged security design encompasses the following tasks:

Extracting ACL information during crawl time.
Storing server and database ACL information in the index.
Creating the user's security context when the user logs in or when the session is initialized. This task must account for the different identifiers that a single user must use to access the various back-end sources.
Processing the search with the user's security context and producing an interim result set that contains only those documents that the user has access to at the repository level.
Post-filtering the interim result set by consulting the back-end sources that contributed documents to the result set to obtain current ACL information.

Exporting documents

Post-filtering of search results is not supported by the export function. If you export documents after searching a secure collection, documents that are excluded from the search results through post-filtering will be included in the set of documents that are exported. Documents that are excluded from the search results through pre-filtering will not be exported.