Level: Introductory Todd Leyba (tleyba@us.ibm.com), Search Architect, IBM
13 Dec 2006 Learn how you can quickly and easily integrate a freely downloadable search engine into your
Web site. This article describes four methods to do this, using IBM® OmniFind™ Yahoo! Edition
search functionality. The methods range from directly linking to the OmniFind
search results page, to using XSLT to transform the XML returned by the OmniFind search API into the
HTML of your design.
Introduction
IBM and Yahoo! have partnered together to offer a free, downloadable search engine that is easy to set up and use.
IBM's OmniFind Yahoo! Edition (referred to as just OmniFind in this article) can crawl and index up to half a million
Web pages or file system documents and make them available for search through a simple-to-use Web interface.
You may have already downloaded OmniFind and discovered how quick and easy it is to setup an index and start searching.
But perhaps you are beyond that point, and are now investigating how to integrate OmniFind's search functionality into your Web site.
In this article I will explore four ways to accomplish this. Namely:
-
Link directly to the OmniFind search page
-
Use your own search box and button
-
Present search results as an HTML snippet
-
Use XSLT to transform OmniFind XML results into HTML
The methods presented here will build upon each other adding increased flexibility at the expense of increased level of effort.
Scenario
The scenario I will follow is to add OmniFind search functionality to a blog site.
If you have ever read a blog, you may have noticed that not all of the blogger's postings are presented on the first page.
Most blog hosting facilities list the most recent postings and provide an archive link to the older postings, typically
organized by month. So if you want to view a posting not listed on the first page, you would need to click through the
previous months to find the article.
In this integration scenario, I will use OmniFind to crawl and index my personal
blog site and then allow users to search my OmniFind blog index to find previous postings. I use Google's Blogger to
host my real blog (Todd Leyba's Perspecitives on Search and Discovery). Blogger
already provides such a search capability, but Blogger's search feature is immutable. I will demonstrate in this article how to
replace Blogger's search facility with OmniFind and then show how to customize the overall search experience.
Option 1: Link to OmniFind directly
The easiest integration option is to provide a link to the OmniFind search page in a conspicuous place somewhere in the blog.
All changes to my blog site are made through a Blogger template. The template defines the overall structure of my blog and
is written in standard HTML. As such it is a simple matter for me to insert a link to the OmniFind search page - in this case
right above the archive section. The link to use would be similar to the following but with the host and port changed to the OmniFind installation site.
http://omniFindhost:8080/search/
The overall look and feel of the OmniFind search interface (shown below) can then be customized by selecting from several layout
options offered in the OmniFind administrator's console. With no programming involved you can change the banners and images to
those of your company, change the text of various labels and buttons, and even choose which features are to appear or not
(such as summaries, footers, and so on).
Figure 1. OmniFind Search Results Page
But the direct link approach described above is cumbersome to use. It forces the user to click twice to issue a
search -- once to get to the OmniFind search page and second to issue the search. Ideally, you would like to have
the search box always present on your site so that users can type their query when needed and then click once to see the results.
Option 2: Add your own search box
The first step is to add your own search box and button to the Web page. I will use standard HTML to add these components and a
small amount of JavaScript to handle the onClick action when the button is pressed.
The following three lines add the search box and button right before the list of "Archives" in the right hand panel of my blog.
<h2 class="sidebar-title">Search entire blog</h2>
<input type="text" name="Query" value="" size="25">
<input type="button" value="Search" onclick="runSearch()">
|
Now we need to provide a JavaScript function to handle the onClick action when the button is pressed.
Right before the body tag in my Blogger template, I insert the following JavaScript:
<SCRIPT LANGUAGE="JavaScript">
<!--Begin
function runSearch()
{
var dest = "http://OmniFindHost:8080/search?";
var params = "index=Default&start=0&results=10&query=";
var request = dest + params + escape(document.forms[0].Query.value);
window.open(request, // complete search url
"OmniFind Search Results", // Title of the window
toolbar=1, // toolbar provides back/fwd
resizable=1, // allow them to resize window
scrollbars=1, // and to scroll as well
height=500, // and I like smaller windows
width=400, // of this size and position
left=80,top=80);
}
// End -->
</SCRIPT>
|
Invoke search
The primary job of the JavaScript function is to collect the keywords entered in the search box and
include them in an OmniFind search request. In this first example I will invoke the OmniFind search
page directly with no modifications. The results page will appear as shown in Figure 1 above and is
the same search results page presented as if you had entered the search directly in OmniFind. The only
difference is that we have used our own search box to accept the search expression.
The URL is similar to the direct link described above but is further qualified with a few additional parameters.
In the JavaScript, I broke up the URL into its constituent parts for readability. There are four parameters used.
The index I created and to be searched is named "Default". The number of results to return is 10, starting with
result zero. If you wanted to return the second page of results for the same query, the "start" parameter would be set to 10.
The "request" variable contains the concatenation of the URL parts and appends the query terms provided by the user
to the end of the "query" parameter. Note that I used the escape function to convert blanks and other special characters
to their escaped representation. The "request" variable containing the fully built OmniFind URL is then passed as the
first parameter to the window.open() function call. The window.open call will submit the request and cause a new window
to be opened with the results of the search. I added a few parameters to the window.open call to control the size, location,
and options of the window. Below is an example search from my blog.
Figure 2. Search box added to blog, and OmniFind search results page
Use the OmniFind REST API
Up to this point you have seen how to successfully use your own search box to submit a search rather than the search box that
appears on the OmniFind search page. Now I'll show you how to change the appearance of the search results more
than what is offered in the OmniFind layout editor. You can accomplish this with the help of the OmniFind search API.
OmniFind's API is REST based which means that you use a standard HTTP GET request with parameters to submit the search.
The search results are returned as XML which we will then transform into our custom HTML using XSLT.
Below is an example OmniFind search request:
http://OmnifindHost:8080/api/search?index=Default&results=10&start=0&query=conferences
You may have noticed that the above URL is nearly identical to the URL issued in the previous example with one exception.
The sub domain "/api/search" is used instead of "/search". This instructs OmniFind to return the results as XML instead of
the fully formatted HTML page shown in Figure 1. The XML that is returned conforms to an ATOM 1.0 feed. Consequently, you
can test your API search requests using any conventional RSS feed reader that supports ATOM 1.0 (I personally use the open
source FeedReader program). The feed reader program will automatically issue
the search and format the results for you. You can also test your API searches with a standard browser which will display the returned
XML natively as shown below.
Figure 3. Search results displayed as XML
Option 3: Results returned as HTML snippets
Before we further discuss the XML that is normally returned, it is important to note that the output of the search
API can also return the results as a snippet of standard HTML. I refer to the output as a snippet because it is not
a complete HTML page (no <HTML><BODY> tags). HTML output is indicated with the "output=snippet" parameter
on the search request with its effect is show below:
http://OmnifindHost:8080/api/search?index=Default&results=10&start=0&query=conferences&output=htmlsnippet
Figure 4. HTML snippets results page
Notice that the format of the results are somewhat similar to those in the OmniFind search results
page with the exception of the missing search box and page controls. This approach has value in certain applications
but is somewhat inflexible. If you want to change the HTML formatting you would need to parse the HTML yourself, not an easy task.
Option 4: Use XSLT to format the search results
Since the results are normally returned as XML, you have the ability to use an XSLT stylesheet to transform the XML into
HTML formatting the results as desired. The XSLT stylesheet would be prepared by you and contain the appropriate XSL and
XPath directives to process the XML ATOM feed elements. In this case I would like the motif of the results page to match
that of my blog using the same color schemes and fonts. Below is the XSLT stylesheet I used which I stored in a file named
"myStyleSheet.xsl". Each line is numbered for easy reference.
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
|
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<xsl:output method="html"/>
<xsl:template match="/">
<style>
<xsl:comment>
.content-area {background-color: #dcedcb;}
.description {font-size: .9em; margin: 0 0 10px 0;}
// Remaining styles omitted for readability
</xsl:comment>
</style>
<xsl:apply-templates select="/atom:feed"/>
</xsl:template>
<xsl:template match="/atom:feed">
<div class="content-area">
<div class="title"><xsl:value-of select="atom:title"/></div>
<ol class="list"><xsl:apply-templates select="atom:entry"/></ol>
</div>
</xsl:template>
<xsl:template match="atom:entry">
<li>
<a href="{atom:link/@href}">
<xsl:value-of select="atom:title" disable-output-escaping="yes"/>
</a>
<div class="list-item-description">
<xsl:value-of select="atom:summary" disable-output-escaping="yes"/>
</div>
</li>
</xsl:template>
</xsl:stylesheet>
|
|
XSLT can be used to transform XML into a variety of formats. Line 6 indicates that the output of this transformation is to be HTML.
I use XSL templates to match the various elements in the XML stream. Line 7 is the main template and matches on all elements in the
XML file. It is within this template that you specify your styles inserted between the xsl:comment directives (Lines 9 and 13).
For readability I omitted the majority of my style directives. Within this main template I reference a subordinate template
for each atom:feed element XSLT encounters. In this case there is only one atom:feed element specified in the OmniFind XML results.
The template for an atom:feed element begins on line 17. It creates an outer HTML "div" tag whose style class is "content-area". Note that a
style for each specified "div" class attribute must be defined above in the main template. Again I purposely omitted most of the style
definitions for readability. The atom:feed template creates an inner "div" tag for the title of the results (line 19). The title is
actually pulled from the XML using the xsl:value-of statement with a select on the element named "atom:title". If you want to provide a
different hard coded title, just replace line 19 with your own HTML statement (e.g., <h2>My Title</h2>). Line 20 inserts an HTML
ordered list and applies another sub template for each "atom:entry" element found in the XML.
The last template definition (starting on line 23) provides the HTML transformations to be applied to each search result. In this
template I create an HTML line item tag for the ordered list, a link to the document, and a brief description of the document.
For the link URL I use the XPath directive (atom:link/@href) to extract that value of the href attribute in the atom:link element (line 25). For
the anchor text itself I use the xsl:value-of directive to extract the contents of the atom:title element within the entry element (line 26). The
same technique is used for the result description as well.
OmniFind conveniently highlights any search terms contained in the title and summary of each result. It does this by
bracketing the encountered search terms with HTML <SPAN> tags to indicate the style to be used for the highlighting. These
HTML tags are embedded within the original XML and are normally escaped by the XSLT processor. This causes the HTML tags to be
shown as is when displayed in the browser (an effect we do not want). The XSLT processor does not know that these are valid HTML
tags and dutifully escapes any special characters it encounters during the processing of the xsl:value-of directive. Under these
circumstances we can instruct the XSLT processor not to escape any special characters with the disable-output-escaping="yes" attribute on lines 26 and 29.
It is important to note that the disable-output-escaping attribute is not honored by all browsers. Microsoft’s Internet Explorer
does disable the output escaping with the desired effect but not according to W3 XSLT specification. Mozilla on the other hand ignores
the attribute so as to be in compliance with the XSLT specification. For Mozilla browsers you can achieve the same effect with different
XSLT commands (not shown in the example).
Using the stylesheet parameter
The "stylesheet" parameter is used on the search request to indicate which XSLT stylesheet is to be applied as shown below.
http://OmnifindHost:8080/api/search?query=cameras&index=Default&results=10&start=0&stylesheet=
http://myserver.com/myStyleSheet.xsl
The use of the "stylesheet" parameter causes OmniFind to insert an xml-stylesheet entry as the second line of the XML search results as shown below.
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href=" http://myserver.com/myStyleSheet.xsl"?>
<feed>...</feed>
This will cause the browser to retrieve the stylesheet from the location specified in the HREF and apply it using XSLT. The results are shown below:
Figure 5. Search box added to my blog, and XSLT transformed search results page
Summary
In this article I presented four methods for integrating OmniFind search functionality into your Web site.
The first and simplest approach was to insert links to the OmniFind search page directly. Next was to replace the
direct links with my own search box and button but to keep the OmniFind search results page unchanged. I then switched over to
using the OmniFind search API so as to better control the formatting of the returned search results. I first showed
how the API can return the search results as a snippet of HTML and then ultimately XML. I finally demonstrated how an
XSLT Stylesheet can be applied to the XML to create completely customized search results. Download OmniFind Yahoo! Edition and try these techniques to enhance your own Web applications.
Resources Learn
Get products and technologies
Discuss
About the author  | 
|  | Todd Leyba is currently serving as an evangelist for Discovery and Search Analytics in IBM's
Information Management Division. He is a key spokesperson, responsible for
engaging with customers, partners, and developers to articulate IBM's Discovery
and Search strategy. In this role, he is also responsible for incorporating
customer and developer feedback as well as market trends into IBM's future
product direction. Mr. Leyba's expertise lies in the architecture of full text
search and retrieval systems and their application in business. He has
previously worked on a variety of search related projects including: IBM's WebSphere
Enterprise Search product (OmniFind), designed to provide superior performance,
scale, and result quality with a broad range of data source support |
Rate this page
|