Process Overview

The next few sections explain the standard process for successfully integrating form-based authentication into a search search collection so that the resource that it is associated with can be crawled successfully. The following is a summary of those steps:

  1. Install any necessary extension for logging HTTP transactions in your favorite browser, enable logging/recording, and authenticate to the remote resource that you want to crawl. See Capturing HTTP Transactions for more information.
  2. Examine the record of HTTP transactions between your browser and the remote resource to identify the final form that was submitted in order to authenticate to the resource that you want to crawl. Identify the parameters that were submitted to that form.
  3. In the Watson Explorer Engine administration tool, add the conditional setting for Form-Based Authentication to the search collection that is associated with the remote resource that requires form-based authentication. See Adding Form-Based Authentication Support for more information.
  4. Configure that conditional setting and save your configuration information. See Configuring Form-Based Authentication for more information.
  5. Begin crawling the remote resource. If the crawl of that resource begins successfully (i.e., retrieves pages that could only be accessed after successful authentication), you do not need to make any further changes to the crawler condition. Otherwise, proceed to the next step.
  6. Collect debugging information for an attempted crawl of the specified resource. See Enabling HTTP Debugging for more information.
  7. Purge all data from the collection. Start a crawl of the remote resource to determine if you have handled all of the forms that are required to authenticate to that resource.
  8. Analyze the debugging log on the DE server, identifying the first form that it encounters which is not handled by your existing form handler, and add an appropriate form-handler for that form. See Enabling HTTP Debugging for more information.
  9. Close the form element, save the updated crawler condition, and retry crawling the associated resource. If the crawl proceeds past the authentication form, proceed to step 10. Otherwise, return to step 7.
  10. Disable any temporary configuration changes that you made to facilitate debugging. See Restoring a Production Configuration for more information.