There are two distinct approaches to filtering documents
to ensure that search results contain only the documents that the
user who submitted the search request is authorized to view.
- The first approach is to replicate the document's source access
control lists (ACLs) at crawl time into the index and to rely on the
search engine to compare user credentials to the replicated document
ACLs. Pre-filtering the documents, and controlling which documents
are added to the index, results in the best performance. However,
it is difficult to model all of the security policies of the various
back-end sources in the index and implement comparison logic in a
uniform way. This approach is also not as responsive to any changes
that might occur in the source ACLs.
- The second approach is to post-filter documents in the result
set by consulting the back-end sources for current security data.
This approach allows the contributing back-end sources to be the final
arbiters of the documents returned to the user, and ensures that the
result set reflects current access controls. However, this approach
results in degraded search performance because it requires that connections
exist with all of the back-end sources. If a source is not accessible,
then links to documents must be filtered out of the result set along
with documents that the user is not authorized to view.
Support for enforcing access controls relies on a combination of
these two approaches. The design provides optimum performance while
maintaining the precise security policies of the originating document
repositories. By storing high-level access control data in the index,
the system can provide an interim (potentially smaller) result set
which can then be post-filtered to verify current access controls.
The assumption is that if the user has access to the repository that
owns the document, then chances are that the user also has access
to the document.
The access control data that is stored in the index varies with
the crawler type. For example, the Notes crawler can store
database- and server-level access controls, and the Quickr for Domino crawler can store
access controls for servers, places, and rooms.
All crawlers and data source types support the ability to index
source access control data during crawl time. Some crawlers and data
source types also support the ability to post-filter the result set
and verify the user's current credentials. This type of support is
provided through data source security mechanisms or the identity management
component.
This two-pronged security design encompasses the following tasks:
- Extracting ACL information during crawl time.
- Storing server and database ACL information in the index.
- Creating the user's security context when the user logs in or
when the session is initialized. This task must account for the different
identifiers that a single user must use to access the various back-end
sources.
- Processing the search with the user's security context and producing
an interim result set that contains only those documents that the
user has access to at the repository level.
- Post-filtering the interim result set by consulting the back-end
sources that contributed documents to the result set to obtain current
ACL information.
Exporting documents
Post-filtering
of search results is not supported by the export function. If you
export documents after searching a secure collection, documents that
are excluded from the search results through post-filtering will be
included in the set of documents that are exported. Documents that
are excluded from the search results through pre-filtering will not
be exported.