Crawl complex form-based authentication Web sites using IBM OmniFind Enterprise Edition

Creating a web crawler plug-in to preprocess documents

From the developerWorks archives

Koichi Nishitani

Date archived: January 12, 2017 | First published: July 05, 2007

Because many organizations have Web-based intranet sites, the web crawler is one of the most prominent features of IBM® OmniFind™ Enterprise Edition. Most organizations have some sort of secure front-end to some or all of the content of these Web sites. One such front-end involves form-based authentication (FBA), a process that allows the user to enter authentication information through an HTML form. Although the web crawler can crawl some sites with FBA, there is no standard way to implement FBA, and there are many products and solutions available that provide FBA mechanisms. Some FBA mechanisms can induce redirects, use non-standard return codes, use multiple cookies, and so on. Therefore, the web crawler FBA settings do not provide support for all available FBA implementations. However, it is possible to write a prefetch web crawler plug-in to negotiate through most complex FBA mechanisms. This article will show you how to negotiate certain FBA mechanisms and how to write a web crawler plug-in that does so.

This content is no longer being updated or maintained. The full article is provided "as is" in a PDF file. Given the rapid evolution of technology, some steps and illustrations may have changed.



static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=238972
ArticleTitle=Crawl complex form-based authentication Web sites using IBM OmniFind Enterprise Edition
publish-date=07052007