Overview

The Watson Explorer Engine enables you to record Access Control List (ACL) information when crawling many types of resources, and stores this information internally, along with each item that is indexed. This information can then be used to enable your search applications to respect the access control settings on the data in your search collection, allowing users to see only those search results that correspond to documents which they are authorized to see.

Two critical concepts for this tutorial are authentication and authorization. Authentication is the requirement that each user be identified as a valid user of a resource, such as a Web site, SMB/CIFS fileshare, database, and so on, by some security mechanism such as generic LDAP (Light-weight Directory Access Protocol) or a Microsoft Active Directory. Authorization is the use of authentication information to determine whether an authenticated user should have access to a specific resource by comparing that authentication information to the list of users and groups who have access to that resource.

Important: In addition to the per-result security that is discussed in this tutorial, you must secure the query and indexer services in your installation(s) to prevent malicious users from submitting requests directly to those services. Unless you secure these services, carefully-crafted requests that are submitted directly to these services can bypass the result-level security enforcement that is provided by the Watson Explorer Engine. See Securing a Watson Explorer Engine Installation in the Watson Explorer Engine Users Manual or the Watson Explorer Engine Installation and Integration Guide for information about securing those services.

This tutorial uses a simple crawl of a Microsoft Windows network shared directory to illustrate how to crawl a resource that uses ACLs to control file access, how to view the ACL information that is associated with the data that you are indexing, and how to validate that the ACL information that you have retrieved is correct. The tutorial then explains how to incorporate end-user authentication information into your search application, how to retrieve user/group information from the remote resource that you are crawling, and how to use that information to determine the users that are authorized to see potential search results.

Note: The outline provided in the following section is intended as a general checklist, not as a complete list of steps for this task. The complete list of steps are in the tutorial section following the overview.

To proceed to a description of the sample environment used in this tutorial, click The Sample Environment for This Tutorial.