Skip to main content

IBM Search and Index APIs (SIAPI) for WebSphere Information Integrator OmniFind Edition

Create a search index for your enterprise data

Srinivas Varma Chitiveli (schitive@us.ibm.com), Advisory Software Engineer, IBM
author photo
Srinivas Varma Chitiveli is a software engineer in the IBM software group. He has been involved with IBM products that deal with technologies related to issuing digital certificates for secure e-business transactions, content management, and searching information across distributed data sources.

Summary:  WebSphere® Information Integrator OmniFind™ Edition is a stand-alone, full text search product offered by IBM®, designed to provide superior performance, scale, and result quality with a broad range of data source support. In addition to providing easy-to-use Graphical User Interfaces (GUIs) for administration and search, OmniFind also provides a robust set of search and index APIs that allow customers to build their own search solutions based on OmniFind technology. In this article, you'll learn the concepts behind OmniFind's API set and explore examples of how to use them in depth.

Date:  12 Jan 2005
Level:  Intermediate
Activity:  1696 views

Introduction

The information found in an enterprise can exist in many shapes and forms. It can be distributed throughout the enterprise and managed by the most appropriate software for the task at hand. The amount of information and its heterogeneous nature makes it difficult for enterprise users to find needed documents in a timely fashion. Consequently, more and more enterprises are relying on a search engine to facilitate the process of search.

The primary goal of an enterprise search engine is to provide quick and relevant responses to inquiries for documents that users are authorized to see. In order to meet the performance and relevance requirements, most search engines build an optimized index that represents the content to be searched. Rather than search the original content, the user is actually posting queries to the index -- much like searching a card catalog in a library. The index is therefore comprised of documents that were extracted from the various back-end data sources.

WebSphere Information Integrator OmniFind Edition is one such enterprise solution that provides capabilities of building and searching indexes with information gathered from various types of sources. OmniFind honors the native access controls of the original content, only allowing users to search and retrieve what they have access to. The language and dictionary capabilities incorporated in the product provide support for searching data in different languages.

The product provides a user-friendly GUI to configure crawlers to extract content from a broad set of common enterprise data sources (such as DB2® databases, file systems, content managers, Domino® databases, and Microsoft® Exchange Servers) into a central index. OmniFind also provides a robust set of search and index APIs that allow customers to build their own search solutions based on OmniFind technology. These APIs further expand the reach and flexibility of searchable content.

The IBM Search and Index API (SIAPI) is a set of interfaces published by IBM to implement functions required to build searchable document indexes and to perform subsequent searches against those indexes. This article provides an in-depth review of the SIAPI implementation, which addresses both the administration and searching of indexes hosted by WebSphere Information Integrator OmniFind Edition. The article explains the concepts and usage of the APIs and provides working sample code and code snippets to illustrate the implementation.

Prerequisites

You need the following in order to test the implementation discussed in this article:

  • WebSphere Information Integrator OmniFind Edition (Version 8.3) should be installed on a dedicated server.
  • Contact your OmniFind administrator to obtain the following files and information:
    • es.siapi.toolkit.jar, a client toolkit that contains all the samples, libraries, and Java™ docs required to administer and search indexes. This archive is in the lib folder of the product installation directory.
    • OmniFind administrator user ID and password (this information is specified when OmniFind is installed).
  • IBM Java Software Development Kit 1.4.x
  • Apache ANT compiler support

Model

OmniFind is an "enterprise" search engine designed and engineered to address the volumes typically encountered in a medium- to large-scale company. As a result, OmniFind is based on a client server architecture whereby one or more servers are dedicated to the operation of search. In most cases, OmniFind is installed on a dedicated machine that hosts the index and an HTTP-based gateway to search the index. The server launches processes that listen for client requests related to the administration and searching of indexes.

The client package includes a set of implemented APIs for creating and administering indexes, populating indexes with documents, and searching indexes on an OmniFind server. Each OmniFind client application is assigned a unique application ID. This application ID is central to authentication and authorization to OmniFind function and content and as such appears on most of the SIAPI calls.

Overview

To create a searchable index, a client must register unique credentials (application ID and password) with the server. These credentials are used by the server to authenticate and authorize search and administration requests. From now on, we will use the term application ID to represent a unique client.

In WebSphere Information Integrator OmniFind Edition, an index is represented by an entity called a collection. Think of a collection as a searchable container that holds documents from different sources or types or even languages. From now on, we will use the term collection interchangeably with index. Each collection created in OmniFind must be associated with at least one application ID.

When you create a collection, you must specify (at a minimum) a name, an ID, and an algorithm for ranking search results. Optionally, you can specify that the collection is secure. A secure collection ensures document-level access to the indexed content. A unique ID is associated to each collection. Every collection is associated with at least one application ID to administer and add documents to the associated index.

Once a collection is created, processes related to parsing should be enabled. The parser is a program that intercepts documents that are added to a collection, extracts information from them, and prepares them for indexing, search, and retrieval. Documents are broken down into tokens.

The following summarizes the document analysis and parsing that is performed by the language-ware and dictionary-ware:

  • The words in the document are stemmed. (Variant word forms such as "connections," "connectors," and "connected" are converted to the common word form "connect.")
  • The words in the document are lemmatized. (Words such as "trees" and "mice" are converted to more basic forms like "tree" and "mouse").
  • Frequently occurring terms in English are dropped (such as articles "the," "a," or "an").
  • To enable XPATH and XML fragment searches, XML syntaxes in the XML content are preserved.
  • N-gram segmentation is honored to break words for Asian languages that do not use white space as a word delimiter.
  • Compound words in German or Korean languages are analyzed and decomposed.
  • Generated tokens are lower-cased to support case-insensitive searches.

At the completion of the parsing phase, the original document has been transformed into a set of tokens appropriate to build efficient indexes.

After you create a collection, you are ready to add documents to it. A document consists of:

  • A unique document ID
  • Searchable content
  • One or more metadata fields
  • Optional security tokens

The unique document ID could be a URL for Web-accessible content, the absolute path of a globally accessible file system, or a proprietary path that can be consumed by a custom fetch application to retrieve the content. Fields form the metadata for the document. End users can search for one or more terms found in the content of the document or search for the value in a specified field.

The developer has the flexibility to determine what fields are returned in a result set. These fields could be the fields that matched the query terms or additional metadata fields that did not participate in the match but are associated with the document. One or more security tokens can be optionally associated with an indexed document to facilitate access control. While searching, the user has to pass this token to ensure retrieval of document on the returned results. Security tokens are honored only when documents are added to a secure collection.

At this stage, you have added documents that are cleaned up, parsed, analyzed, and ready to be indexed. It is important to note that while documents have been added to the index using the addDocument() API, the actual indexing of those documents has not yet taken place. A build index process must be initiated to create a searchable index out of the generated tokens.

You have the option of building multiple small indexes or reorganizing small indexes into one big index. Small indexes take less time to compile but as the numbers of such small indexes grow, the search response time might be compromised. A build that reorganizes the index takes more time to compile but provides better search response times. Irrespective of the build type, the indexer also detects and drops duplicate documents to preserves disk consumption.

With a complete index build, you are free to enable the search facility that is a J2EE™ application that listens on an HTTP port to process search requests. The search results contain a summary of the document content, the document ID, and the fields that make up the metadata.

Figure 1 shows a high-level diagram of the components that work to build a searchable index:


Figure 1. Components for indexing and searching
Components for Indexing and Searching

If you have come this far, you have learned the intricacies of building a searchable index. The following sections will be more programmatic in nature and should be a simple read-through for programmers.


Building an index

The IBM Search and Index API (SIAPI) is a set of interfaces defined by IBM for searching and administering indexes. The SIAPI implementation is available in WebSphere Portal Server (Version 5.1) to create custom search portlets. It is also available in WebSphere Information Integrator OmniFind Edition to create and search indexes. This article introduces the IBM Search and Index API (SIAPI) implementation within OmniFind. I designed the following sections to show small snippets of sample code that are easy to understand, followed by a complete working sample that summarizes the previous sections.

Register the application ID

First things first, a client should register a unique application ID and password with the OmniFind server. This information is used by the OmniFind server to authenticate and authorize a client request. If the specified ID is not unique, you will receive a SiapiException.


Listing 1. Code snippet to register the application ID
				
// instantiate the admin factory
AdminFactory adminfactory = SiapiAdminImpl.createAdminFactory(
                               "com.ibm.es.siapi.admin.AdminFactoryImpl");

// instantiate the admin service 
AdminService adminService = adminfactory.getAdminService(null);

// create instance of application ID
ApplicationInfo clientID = adminFactory.createApplicationInfo("SIAPIClient", "search");

// create instance of OmniFind administrator
// This information has to be given to you by the OmniFind 
// administrator. Following is a sample only.
ApplicationInfo adminID = adminFactory.createApplicationInfo("esadmin", "search");

// create your application ID
adminService.registerApplication(adminID, clientID, -1);

Create a collection

Next, create a unique collection with a name, ID, and a ranking model (at a minimum). The goal is to add documents to this collection. If the specified collection ID is not unique, you receive a SiapiException.


Listing 2. Code snippet to create a collection
				

String collectionID = "siapi";
String collectionName = "SIAPI Client Collection";
String collectionLanguage  = "en";

// choose a rank model
// 0: Rank documents by links
// 1: Rank documents by date
// 2: no static ranking
int rankModel = 0;

// create collection 
Properties config = new Properties();
adminService.createCollection(
clientID,
collectionID,
collectionName,
rankModel,
collectionLangauge,
config);

Enable the collection for indexing

Start the processes that will intercept the document before it is added to the index. The processes will analyze and decompose the documents into tokens, and store them in a temporary store.


Listing 3. Code snippet to enable the collection for indexing
				
// Enable the services required to process the documents into tokens
adminService.enableCollectionForIndexing(clientID, collectionID, null);

Add documents

You are now set to add juice to your index. You need to instantiate the documents that you want to index.


Listing 4. Sample code listing to add URL-based documents
				
// instantiate index factory
IndexFactory indexFactory = SiapiIndexImpl.createIndexFactory(
                                         "com.ibm.es.siapi.index.IndexFactoryImpl");

//instantiate index service
IndexService indexService = indexFactory.getIndexService(null);

//get reference to the index associated with your collection
Index index = indexService.getIndex(clientInstance, collectionID);

// create document instance
// we love our presidents so I will walk you through adding a 
// URL-based document
String documentURL= http://www.whitehouse.gov/history/presidents/gw1.html;

String documentContent = "On April 30, 1789, George Washington, standing on the balcony of 
                          Federal Hall on Wall Street in New York, took his oath of office 
                          as the first President of the United States.";

String documentSource = "Web";
String documentMimeType = "text/html";

// instantiate documents from index factory
Document document1 = indexFactory.createDocument(
documentURL,
documentContent, 
documentSource,
documentMimeType);
document1.setDate(new Date());
document1.setLanguage("en");
document1.setRawContentFormat(documentMimeType);


// create metadata for the document
// instantiate a title field
String fieldname = "Title";
String fieldValue = "Home page of George Washington";
Field title = indexFactory.createField(
fieldName, 
fieldValue);

// field can be part of the query
title.setFieldSearchable(true);
// since field content is not numeric, disable any numeric searches
title.setParametric(false);
// field will be part of the search results
title.setReturnable(true);
// add the field to the document
document1.addField(title);

// likewise I will add more fields to the document

// field to represent the first name of our president
Field  firstName = indexFactory.createField("FirstName", "George");
firstName.setFieldSearchable(true);
firstName.setParametric(false);
firstName.setReturnable(true);
// add field to document
document1.addfield(firstName);

// field to represent the lastName of our president
Field lastName = indexFactory.createField("LastName", "Washington");
lastName.setFieldSearchable(true);
lastName.setParametric(false);
lastName.setReturnable(true);
// add field to document
document1.addfield(lastName);

// field to represent the start year of our president
Field startYear= indexFactory.createField("StartYear", "1789");
startYear.setFieldSearchable(true);
// since the content of the field is numeric in nature, you can 
// enable numeric searches
startYear.setParametric(true);
startYear.setReturnable(true);
// add field to document
document1.addField(startYear);

// field to represent the end year of our president
Field endYear= indexFactory.createField("EndYear", "1797");
endYear.setFieldSearchable(true);
endYear.setParametric(true);
endYear.setReturnable(true);
// add field to document
document1.addField(endYear);

//now add the document to our index
index.addDocument(document1);

You can follow the above snippet to add more documents to the index.

Build the index

At this time, the analyzing and parsing processes should have intercepted the documents and they are now ready to be indexed. It's a good time to build an index.


Listing 5. Sample code listing to build the index
				
// build an index out of the processed content
index.build();

Enable the collection for search

Almost there! Now you have a searchable index and need to publish this index for searching.


Listing 6. Sample code listing to enable the index for search
				
// enable collection for searching
adminService.enableCollectionForSearch(clientID, collectionID, null);


Search the index

If you have come this far, you have successfully built a searchable index that contains URL-based documents related to our presidents. It's time to develop a search sample that will execute queries and retrieve documents.

Simple search sample

The search API exposes a plethora of options and demands a developerWorks article of its own. I will introduce you to a bare-bones sample that will get you going.


Listing 7. Sample code listing to search the index
				
// instantiate search factory
SearchFactory searchFactory = SiapiSearchImpl.createSearchFactory(
                                         "com.ibm.es.api.search.RemoteSearchFactory"); 

// create instance of your client ID 
ApplicationInfo clientID = searchFactory.createApplicationInfo("SIAPIClient", "search");

// create Properties object which contains the information to access
// the search gateway hosted on the OmniFind server
Properties config = new Properties();
// set the hostname of the OmniFind server
String omnifindServerName = "youserver.coompany.com";
config.setProperty("hostname", omnifindServerName);
// set the HTTP port number 
String httpPort = "80'
config.setProperty("port", httpPort);

// instantiate the search service
SearchService searchService = searchFactory.getSearchService(config);

// instantiate a searchable object to the specified collection ID
String collectionID = "siapi";
Searchable searchable = searchService.getSearchable(
clientID, collectionId);

// create a new Query object using the specified query string
String  queryString = "presidents";
Query q = searchFactory.createQuery(queryString);

// execute the search by calling the search method on the searchable
// instance.  A SIAPI ResultSet object will be returned
ResultSet rset = searchable.search(q);

// walk through the array of results from the ResultSet
Result r[] = rset.getResults();
if (r != null) {
  // walk the results list and print out the
  // document identifier
  for (int k = 0; k < r.length; k++) {
    System.out.println(
      "Result " + k + ": " + r[k].getDocumentID());
    System.out.println(
      "Title " + k + ": " + r[k].getTitle());
  }
}

Congratulations; you have achieved your objective of building and searching an index.


Complete sample

This section will show you complete samples of what we have learned so far:


Listing 8. Complete sample code to build and search index
				
import java.util.Date;
import java.util.Properties;

import com.ibm.siapi.SiapiException;
import com.ibm.siapi.admin.AdminFactory;
import com.ibm.siapi.admin.AdminService;
import com.ibm.siapi.admin.SiapiAdminImpl;
import com.ibm.siapi.common.ApplicationInfo;
import com.ibm.siapi.index.Document;
import com.ibm.siapi.index.Field;
import com.ibm.siapi.index.Index;
import com.ibm.siapi.index.IndexFactory;
import com.ibm.siapi.index.IndexService;
import com.ibm.siapi.index.SiapiIndexImpl;
import com.ibm.siapi.search.Query;
import com.ibm.siapi.search.Result;
import com.ibm.siapi.search.ResultSet;
import com.ibm.siapi.search.SearchFactory;
import com.ibm.siapi.search.SearchService;
import com.ibm.siapi.search.Searchable;
import com.ibm.siapi.search.SiapiSearchImpl;

/*
 * The sample code demonstrates the code required to build and  
 * search index
 */
public class DWIndexingAndSearching {
private ApplicationInfo clientID = null;
private String collectionID = "siapi";
private AdminFactory adminFactory = null;
private AdminService adminService = null;
private IndexFactory indexFactory = null;
private IndexService indexService = null;
private Index index = null;
private SearchFactory searchFactory = null;
private SearchService searchService = null;

private void registerClientID() throws SiapiException {
  // instantiate the admin factory
  adminFactory =
  SiapiAdminImpl.createAdminFactory(
      "com.ibm.es.siapi.admin.AdminFactoryImpl");

  // instantiate the admin service 
  adminService = adminFactory.getAdminService(null);

  // create instance of application ID
  clientID = adminFactory.createApplicationInfo(
              "SIAPIClient", "search");

  // create instance of OmniFind administrator
  // This information has to be given to you by the OmniFind 
  // administrator. Following is a sample only.
  ApplicationInfo adminID =
      adminFactory.createApplicationInfo("esadmin", "search");

  // create your application ID
  adminService.registerApplication(adminID, clientID, -1);
}

private void createCollection() throws SiapiException {
  String collectionName = "SIAPIClient Collection";
  String collectionLanguage = "en";

  // choose a rank model
  // 0: Rank documents by links
  // 1: Rank documents by date
  // 2: no static ranking
  int rankModel = 0;

  // create collection 
  Properties config = new Properties();
  // enable security to implement document-level access control
  // if security is enabled, the documents can be associated with
  // an access control token (this could be a group name) and
  // while searching, the client has to pass the same access control
  // token to retrieve the document. This is optional 
  // config. setProperty("EnableCollectionSecurity", "true");

  adminService.createCollection(
      clientID,
      collectionID,
      collectionName,
      rankModel,
      collectionLanguage,
      config);

}

private void enableCollectionForIndexing() throws SiapiException {
  // Enable the services required to process the documents into 
  // tokens
  adminService.enableCollectionForIndexing(
              clientID, collectionID, null);
}

private void addDocuments() throws SiapiException {
  // instantiate index factory
  indexFactory =
    SiapiIndexImpl.createIndexFactory(
      "com.ibm.es.siapi.index.IndexFactoryImpl");
  // instantiate index service
  indexService = indexFactory.getIndexService(null);
  // get reference to the index associated to your collection
  index = indexService.getIndex(clientID, collectionID);

  // create document instance
  // we love our presidents so I will walk you through adding  
  // URL-based documents
  Document document1 = null;
  String documentURL =
     	"http://www.whitehouse.gov/history/presidents/gw1.html";
  String documentContent =
       		"On April 30, 1789, George Washington, standing on the balcony of 
       		Federal Hall on Wall Street in New York, took his oath of office
       		as the first President of the United States.";
  String documentSource = "Web";
  String documentMimeType = "text/html";

  // instantiate documents from index factory
  document1 =
    indexFactory.createDocument(
      			documentURL,
      			documentContent,
       			documentSource,
       			documentMimeType);

  document1.setDate(new Date());
  document1.setLanguage("en");
  document1.setRawContentFormat(documentMimeType);
  // if the collection is configured to be secure, 
  // you can associate access control tokens to the document
  // one of the tokens have to be set while searching.
  // This is optional
  // document1.setACL(new String[]{"admin,staff"});

  // create metadata for the document 
  // instantiate a title field
  String fieldName = "Title";
  String fieldValue = "Home page of George Washington";
  Field title = indexFactory.createField(fieldName, fieldValue);

  // field can be part of the search query
  title.setFieldSearchable(true);
  // since field content is not numeric, disable any numeric searches
  title.setParametric(false);
  // field will be part of the search results
  title.setReturnable(true);
  // add the field to the document
  document1.addField(title);

  // likewise I will add more fields to the document
  // field to represent the first name of our president
  Field firstName = indexFactory.createField("FirstName", "George");
  firstName.setFieldSearchable(true);
  firstName.setParametric(false);
  firstName.setReturnable(true);
  // add field to document
  document1.addField(firstName);

  // field to represent the lastName of our president
  Field lastName = indexFactory.createField("LastName", "Washington");
  lastName.setFieldSearchable(true);
  lastName.setParametric(false);
  lastName.setReturnable(true);
  // add field to document
  document1.addField(lastName);

  // field to represent the start year of our president
  Field startYear = indexFactory.createField("StartYear", "1789");
  startYear.setFieldSearchable(true);
  // since the content of the field is numeric in nature, you can 
  // enable numeric searches
  startYear.setParametric(true);
  startYear.setReturnable(true);
  // add field to document
  document1.addField(startYear);

  // field to represent the end year of our president
  Field endYear = indexFactory.createField("EndYear", "1797");
  endYear.setFieldSearchable(true);
  endYear.setParametric(true);
  endYear.setReturnable(true);
  // add field to document
  document1.addField(endYear);
  
  // NOW add the document to our index
  index.addDocument(document1);
}

private void buildIndex() throws SiapiException {
  // build an index out of the processed content
  index.build();
}

private void enableCollectionForSearching() throws SiapiException {
  // enable collection for searching
  adminService.enableCollectionForSearch(
    clientID, collectionID, null);
}

private void searchCollection() throws Exception {
  // instantiate search factory
  searchFactory =
    SiapiSearchImpl.createSearchFactory(
      "com.ibm.es.api.search.RemoteSearchFactory");
  // create instance of application ID
  ApplicationInfo searchClientID =      
    searchFactory.createApplicationInfo("SIAPIClient", "search");

  // create Properties object which contains the information to 
  // access the search gateway hosted on the OmniFind server
    Properties config = new Properties();
  // set the hostname of the OmniFind server
  String omnifindServerName = 
    System.getProperty("es_server_hostname", null);
  if(omnifindServerName == null){
    System.out.println(
      "es_server_hostname has to be passed with a -D option.");
    throw new Exception("Could not resolve the server name");
  }
  config.setProperty("hostname", omnifindServerName);
  // set the HTTP port number 
  String httpPort = "80";
  config.setProperty("port", httpPort);

  // instantiate the search service
  searchService = searchFactory.getSearchService(config);

  // instantiate a searchable object to the specified collection ID
  Searchable searchable =
      searchService.getSearchable(searchClientID, collectionID);

  // create a new Query object using the specified query string
  String queryString = "presidents";
  Query q = searchFactory.createQuery(queryString);
    
  // Only when the collection is configured to be secure
  // and the document was associated with a access control list (ACL).
  // This is optional.
  // q.setACLConstraints("staff");

  // execute the search by calling the search method on the 
  // searchable instance.  A SIAPI ResultSet object will be returned
  ResultSet rset = searchable.search(q);

  // walk through the array of results from the ResultSet
  Result r[] = rset.getResults();
  if (r != null) {
    // walk the results list and print out the document ID
    System.out.println("Total search results: " + r.length);
    for (int k = 0; k < r.length; k++) {
      System.out.println(
        "Result " + k+1 + ": " + r[k].getDocumentID());
      System.out.println("Title " + k+1 + ": " + r[k].getTitle());
    }
  }
}

public static void main(String[] args) {
  DWIndexingAndSearching dw = new DWIndexingAndSearching();
  try {
    dw.registerClientID();
    dw.createCollection();
    dw.addDocuments();
    dw.enableCollectionForIndexing();
    dw.enableCollectionForSearching();
    dw.buildIndex();
    dw.searchCollection();
  } catch (SiapiException e) {
    e.printStackTrace();
    System.out.println(e.getLocalizedMessage());
  } catch (Exception e2) {
    e2.printStackTrace();
    System.out.println(e2.getLocalizedMessage());
  }
}
}


Advanced configuration

So far the APIs help you create searchable indexes with basic configuration. In this section, I will brief you on few advanced configurations that can be exploited for a custom solution.

Document-level access and secure collections

Earlier in the overview, I mentioned the capability of returning results to a user that are based on access controls. If the managers in your organization belong to a group called "Admins" and the employers belong to a group called "Staff," you can associate the access control tokens (here they are groups) to each document so that the search results are derived from the specified access controls. To accomplish the task of assigning access controls to the documents, a secure collection has to be created. This is accomplished by assigning an extra property called EnableCollectionSecurity to the createCollection() API. While adding content with the instance of an Index class, access tokens has to be associated to each document. setACL() function of the Document class should be invoked with comma-separated lists of multiple access tokens. And setACLConstraints() function of the Query class should be instantiated with rightful access token to retrieve assigned documents in the search results. I have mentioned these APIs in the complete sample but have commented them out.

Remove documents

There is also the capability of removing documents that might no longer be valid or current. Documents can be specified by a unique ID or patterns to ensure the absence of the matched documents in the search results. Once the removeDocument() API is invoked with an ID or pattern, subsequent search queries will no longer return the invalid documents in search results. The specified documents can also be marked for permanent deletion from the physical index. The changes in the index data structure are committed on a subsequent build.


Listing 9. Sample code listing to build the index
				
// remove a particular document
index.removeDocument("http://www.whitehouse.gov/history/presidents/gw1.html");

// remove documents that match pattern
index.removeDocument("http://www.whitehouse.gov/*");

Fragmentation count

When a build() index API is invoked, hierarchical structures of directories and files are created on the hard disk. These files represent a data structure of indexed terms. Large volumes of such file I/O that are required to create or merge small indexes is bound to hit the performance of the operation. The performance drawbacks of file I/O can be minimized by increasing the CPU speed or upgrading to more efficient hard drive technologies. But to work with limited hardware resources, the product provides flexibility of building multiple small indexes or reorganize indexes into one large index.

Multiple calls to the build() API on the Index class will create smaller indexes. But if the numbers of such indexes grow, the response time for search might be compromised. To fine-tune the performance while building the index and searching the indexes, it's advisable to reorganize smaller indexes into one big index on a calculative manner. You could explicitly invoke the reorganize() API on the Index class or set a fragmentation count that directs the software to merge smaller indexes when the number of smaller indexes reach a multiple of the fragmentation count. For example, if the fragmentation count is set to 2, smaller indexes will be merged into one big entity only when 2 (or its multiples) of small indexes already exist. The sample in Listing 10 highlights the code changes:


Listing 10. Sample code listing enable fragmented builds
				

//get reference to the index associated with your collection
Index index = indexService.getIndex(clientInstance, collectionID);

// set fragmentation count to 2 so that smaller indexes are
// reorganized when specified multiples of smaller indexers
// already exist. If sufficient number of indexes do not exist, a 
// smaller index will be built.
index.setProperty("BUILD_FRAGMENTATION_COUNT" , "2");

// Now the logic to build a small index or merge  indexes will
// be managed by the following build call
index.build();


Summary

This article has covered the IBM Search and Index API (SIAPI) implementation to create and search indexes hosted by WebSphere Information Integration OmniFind Edition. WebSphere Portal search engine and WebSphere II OmniFind Edition are examples of products that provide such implementations. Unlike other solutions like Lucene, WebSphere Information Integration OmniFind Edition ships the required software to analyze documents and tokenize them for efficient indexing. The product is designed and packaged to deliver a one-stop solution to address most of the thinkable applications. On top of the base product, the easy-to-use client APIs provides sufficient capabilities to expand the scope of searching any unstructured data you might want to index.


Resources

About the author

author photo

Srinivas Varma Chitiveli is a software engineer in the IBM software group. He has been involved with IBM products that deal with technologies related to issuing digital certificates for secure e-business transactions, content management, and searching information across distributed data sources.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, WebSphere
ArticleID=101553
ArticleTitle=IBM Search and Index APIs (SIAPI) for WebSphere Information Integrator OmniFind Edition
publish-date=01122005
author1-email=schitive@us.ibm.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers