Overview
The Watson™ Explorer Engine enables you to record Access Control List (ACL) information when crawling many types of resources, and stores this information internally, along with each item that is indexed. This information can then be used to enable your search applications to respect the access control settings on the data in your search collection, allowing users to see only those search results that correspond to documents which they are authorized to see.
Two critical concepts for this tutorial are authentication and authorization. Authentication is the requirement that each user be identified as a valid user of a resource, such as a Web site, SMB/CIFS fileshare, database, and so on, by some security mechanism such as generic LDAP (Light-weight Directory Access Protocol) or a Microsoft Active Directory. Authorization is the use of authentication information to determine whether an authenticated user should have access to a specific resource by comparing that authentication information to the list of users and groups who have access to that resource.
This tutorial uses a simple crawl of a Microsoft Windows network shared directory to illustrate how to crawl a resource that uses ACLs to control file access, how to view the ACL information that is associated with the data that you are indexing, and how to validate that the ACL information that you have retrieved is correct. The tutorial then explains how to incorporate end-user authentication information into your search application, how to retrieve user/group information from the remote resource that you are crawling, and how to use that information to determine the users that are authorized to see potential search results.
To proceed to a description of the sample environment used in this tutorial, click The Sample Environment for This Tutorial.