When you create a dataset, you can enable document-level security for the dataset. Document-level security ensures that the search results contain only documents that the user who submitted the search request is authorized to see.
Pre- and post-filtering of search results
There are two distinct approaches to filtering documents to ensure that search results contain only the documents that the user who submitted the search request is authorized to view.
- The first approach is to replicate the document's source access control lists (ACLs) at crawl time into the index and to rely on the search engine to compare user credentials to the replicated document ACLs. Pre-filtering the documents, and controlling which documents are added to the index, results in the best performance. However, it is difficult to model all of the security policies of the various back-end sources in the index and implement comparison logic in a uniform way. This approach is also not as responsive to any changes that might occur in the source ACLs.
- The second approach is to post-filter documents in the result set by consulting the back-end sources for current security data. This approach allows the contributing back-end sources to be the final arbiters of the documents returned to the user, and ensures that the result set reflects current access controls. However, this approach results in degraded search performance because it requires that connections exist with all of the back-end sources. If a source is not accessible, then links to documents must be filtered out of the result set along with documents that the user is not authorized to view.
Support for enforcing access controls relies on a combination of these two approaches. The design provides optimum performance while maintaining the precise security policies of the originating document repositories. By storing high-level access control data in the index, the system can provide an interim (potentially smaller) result set which can then be post-filtered to verify current access controls. The assumption is that if the user has access to the repository that owns the document, then chances are that the user also has access to the document.
All crawlers and data source types support the ability to index source access control data during crawl time. Some crawlers and data source types also support the ability to post-filter the result set and verify the user's current credentials.
This two-pronged security design encompasses the following tasks.
- Extracting ACL information during crawl time.
- Storing server and database ACL information in the index.
- Creating the user's security context when the user logs in or when the session is initialized. This task must account for the different identifiers that a single user must use to access the various back-end sources.
- Processing the search with the user's security context and producing an interim result set that contains only those documents that the user has access to at the repository level.
- Post-filtering the interim result set by consulting the back-end sources that contributed documents to the result set to obtain current ACL information.
You need to be aware that there would be possible security exposure on facet counting with post-filtering. Since some documents will be dropped from the results during the post-filtering stage, the number of documents which are shown on the search application as document and facet counts could be different from (or could be larger than) the total number of the documents that are actually allowed to be searched by the current user.
Setting up security
You enable document-level security when creating a dataset. This setting cannot be changed later. Any collection created using this dataset will be a secured collection. You can disable pre-filtering or post-filtering of search results in the Edit tab for a secured collection.
You cannot use the export document function for a secured collection.
- Currently, only the following crawler supports document-level security:
- SharePoint crawler crawling SharePoint Server (on-premise versions of SharePoint) with SAML
- Refer to SharePoint crawler - configuration properties for details of SharePoint crawler configuration.
crawler crawling SharePoint Online with the default Azure Active Directory (Azure AD)
- Refer to SharePoint crawler - configuration properties for details of SharePoint crawler configuration. You can also find information on how to enable document-level security for SharePoint Online at Configuring Document-level Security for SharePoint Online
- SharePoint crawler crawling SharePoint Server (on-premise versions of SharePoint) with SAML authentication.
- Single sign-on (SSO) between SharePoint and Watson™ Explorer is not supported.
When errors related to SSL self-signed certificate validation occur, perform the following steps to upload the certificate file.
- Get the certificate file by running
openssl s_client -showcerts -connect <sharepoint_server>:<port> < /dev/null > cacerts
- Edit the cacerts file to remove the lines before
-----BEGIN CERTIFICATE-----and after
- Copy the cacerts file to the /wexdata/config/certs/jvm directory inside the docker container.
- Restart Watson Explorer.