The next few sections explain the standard process for successfully integrating form-based
authentication into a search search collection so that the resource that it is associated with
can be crawled successfully. The following is a summary of those steps:
- Install any necessary extension for logging HTTP transactions in your favorite browser,
enable logging/recording, and authenticate to the remote resource that you want to crawl.
See Capturing HTTP Transactions for more information.
- Examine the record of HTTP transactions between your browser and the remote resource to
identify the final form that was submitted in order to authenticate to the resource that you
want to crawl. Identify the parameters that were submitted to that form.
- In the Watson Explorer Engine administration tool, add the conditional setting for
Form-Based Authentication to the search collection that is associated with the remote
resource that requires form-based authentication. See Adding Form-Based Authentication Support for more information.
- Configure that conditional setting and save your configuration information. See Configuring Form-Based Authentication for more information.
- Begin crawling the remote resource. If the crawl of that resource begins successfully
(i.e., retrieves pages that could only be accessed after successful authentication), you do
not need to make any further changes to the crawler condition. Otherwise, proceed to the
next step.
- Collect debugging information for an attempted crawl of the specified resource. See Enabling HTTP Debugging for more information.
- Purge all data from the collection. Start a crawl of the
remote resource to determine if you have handled all of the forms that are required to
authenticate to that resource.
- Analyze the debugging log on the DE server, identifying the first form that it encounters
which is not handled by your existing form handler, and add an appropriate form-handler for
that form. See Enabling HTTP Debugging for more
information.
- Close the form element, save the updated crawler condition, and retry crawling the
associated resource. If the crawl proceeds past the authentication form, proceed to step
10. Otherwise, return to step
7.
- Disable any temporary configuration changes that you made to facilitate
debugging. See Restoring a Production Configuration for more
information.