Before you start
What will you learn?
This tutorial demonstrates how to combine the search results delivered by the OmniFind Search and Index API (SIAPI) with the results of structured database queries. The combination of semantic search and SQL queries allows you to build powerful applications that can close the gap between structured and unstructured information.
Who should read this tutorial?
This tutorial is written for people who want to build custom solutions based on IBM OmniFind Enterprise Edition V8.4. To understand this tutorial and get the maximum benefit of it, you should already be familiar with the following topics:
- Unstructured Information Management Architecture (UIMA):
Basic UIMA skill is needed. You should know about UIMA typesystems:
- What is an UIMA typesystem?
- What does it look like?
- what it is needed for?
Again, only basic skills are needed. You should be able to install DB2 and know how to create a database and some sample tables. The database and all tables needed to run this tutorial will be created by a script.
You only need basic skills. During this tutorial you will make use of the OmniFind administration GUI to create OmniFind collections and upload a custom annotator and a database mapping file for the text analysis results. Each of these tasks is explained in detail and there is a step-by-step description with screenshots for each of them. However, you should already have a basic understanding of how OmniFind works and how to use it. A look at the Text Analysis Integration book found with the documentation of your OmniFind installation is also highly recommended.
You should be familiar with the features and the functionality of Cas2Jdbc. It is strongly recommended to do the first tutorial of this series before continuing with this tutorial. To become familiar with Cas2Jdbc, refer to Part 1 of this series.
- Semantic search and custom index mappings
You should have a basic understanding of what semantic search means and what custom index mappings are needed for. To build your own applications using semantic search, you should have a deeper understanding of custom index mappings. It is recommended to read the short chapter "Index mapping for custom analysis results" of the "Text Analysis Integration" handbook coming with your OmniFind installation before continuing with this tutorial. In case you prefer a more down-to-earth approach, you might find "Semantic search in WebSphere Information Integrator OmniFind Edition: Deploy a semantic search solution" (developerWorks, August 2005) useful.
Again, you should already be familiar with the usage of OmniFind's SIAPI. Basic skill is recommended for this tutorial. To get a better understanding of the source code found with the sample search application in this tutorial, refer to "IBM Search and Index APIs (SIAPI) for WebSphere Information Integrator OmniFind Edition" (developerWorks, January 2006).
- Linux® and Windows®:
This tutorial is described for both Linux and Windows platforms, you should have basic skills regarding how to copy files, create directories, run scripts, and change file and directory permissions. In some parts of the tutorial, you have to edit files found in the Download section. It is recommended to use an XML editor of your choice to manipulate XML files and validate them before you upload them to your OmniFind system. The OmniFind system will reject invalid XML files with an error message but it's more comfortable to see these errors in a good XML editor (that can, for example, highlight the affected XML elements).
To run this tutorial, you need to have IBM OmniFind Enterprise Edition V8.4 installed and running. Make sure to have the latest fix pack installed. You also need DB2 Version 8.2 or later installed. (Starting with OmniFind 8.4 this can be one and the same machine). It is not recommended to run this tutorial in a production environment. The sample document set found in the Download section of this tutorial contains around 20 documents and consumes around 1MB of space. The sample database on your DB2 machine needs around 20MB of space.
How long will it take?
The duration of this tutorial depends on your precognition, usually it should take you between two and three hours to get the samples running and to look at the source code.
A quick overview about what you will learn
The indexing time flow
You may remember a similar diagram from Part 1 of this series:
Figure 1. The flow during the OmniFind indexing time
- Cas2Jdbc runs as part of the parser component and stores selected text analysis results to an external database.
- Whereas the indexing step in Part 1 was optional and you only worked with the data stored by Cas2Jdbc, you need the OmniFind search index for this tutorial. The indexer stores all data that is needed to search for a document in the OmniFind search index. Furthermore, you will extend the default index mapping to make the indexer also store additional information needed for the semantic search queries of this tutorial.
The runtime (search) flow
The runtime (or search) flow of a search application usually looks as follows:
Figure 2. The flow during the search time
- There is a custom search application that makes use of the OmniFind search index, the text analysis results stored by Cas2Jdbc and perhaps an external datasource or program (for example, this could be another database or a flat file, but also a whole system like an enterprise resource planning [ERP] or customer relationship management [CRM] system.)
The flow for this tutorial's sample search applications is as follows:
- The search application makes use of the OmniFind SIAPI to retrieve search results matching a semantic search query.
- The search application processes the results retrieved from the OmniFind search index. These results contain an identifier that can be used to retrieve the corresponding records from the database Cas2Jdbc, used to store the text analysis results of interest.
- The search application retrieves these records from the database, prepares them and, for example, supplements them with data retrieved from another external application. (Note: This tutorial uses one and the same database. For the steps 2 and 3, that means the text analysis results stored in the database are supplemented with sample data coming from the same database)
Set up the tutorial environment
Necessary steps to set up the tutorial environment
Download the ZIP file associated with this tutorial (cas2jdbc_tutorial.zip) from the Download section, and extract it somewhere on your local machine (it creates a directory named cas2jdbc_tutorial that contains all the necessary folders and files for this tutorial).
Prepare the OmniFind system
The following steps must be performed on your OmniFind node
- Copy the subfolder documents to your OmniFind system and make sure the folder and all its files are accessible for the OmniFind system (this means the esadmin user needs to have read permission on the folders and files). On a Linux system, it's a good idea to copy the files to the home directory of the esadmin user (/home/esadmin/cas2jdbc_tutorial/).
If your DB2 server is not identical with the machine your OmniFind installation is running on, the JDBC driver libraries from your DB2 machine must be copied to the OmniFind node.
- The driver libraries can be found in the java directory of your DB2 installation. The three libraries db2jcc.jar, db2jcc_license_cu.jar, and db2jcc_license_cisuz.jar need to be copied to a folder on your OmniFind node.
- Again, make sure the driver library files you copied from your DB2 installation can be accessed by the OmniFind system user (esadmin user needs to have read permission).
Create the sample database
The following steps must be performed on your DB2 machine.
- Create a system user with the username tutorial and the password password. Make sure the created account is not locked or expired. This user will be used to connect to the database and create the necessary schemas and tables. (Do not forget to delete this user after you've finished the tutorial.)
- Copy the subfolder database from the tutorial's zip file to the home directory of the tutorial user you created in the previous step. (On Windows, just copy it to a folder on your DB2 machine.) On Linux, make sure that both the tutorial and db2inst1 users have read and write permissions on all these folders and files.
Create the database needed for the tutorial:
- On Linux, log on as the db2inst1 user. On Windows, open a DB2 command line.
- In the database folder, you'll find a script called setupDB2.ddl that contains the necessary statements to create the database.
Run this script by executing the
db2 -f setupDB2.ddlcommand.
- Make sure that all DB2 commands complete successfully. If you get some errors here, make sure DB2 is installed correctly and that you have done all steps described previously.
Modify the Cas2Jdbc mapping files to fit in your system
The following steps must be performed on your local machine.
- Edit the Cas2Jdbc mapping file placed in the cas2jdbc folder of the tutorial's resources. Note: Although the files are on your local machine, you'll have to modify them to fit in your OmniFind node's environment. (Remember that these files will be uploaded to your OmniFind system later.)
- Open the file cas2jdbc_tutorial/cas2jdbc/cas2jdbc.xml.
<connectionUrl>element by replacing the placeholder myHostname with the hostname of your DB2 machine. The database name (tutorial) is already correct. The port (50000) might be different on your machine depending on the number of databases existing on your machine. For every new database created, DB2 increments the port number by one. Refer to your DB2 documentation to find out the correct port.
Adapt the path inside the
<driverLibrary>elements. Make sure to modify all three
<driverLibrary>elements. Take care that the specified path is correct and points to the JDBC driver libraries on your OmniFind node. This is either the java folder of your DB2 installation folder, or the folder to that you copied the JDBC driver libraries from the DB2 machine in during a previous step).
Prepare the sample search application
The following steps must be performed on your local machine.
- On your local machine, locate the folder cas2jdbc_tutorial/searchApp/tutorial_app_war/WEB-INF/lib.
- Copy the previously described JDBC driver libraries to this folder (db2jcc.jar, db2jcc_license_cu.jar and db2jcc_license_cisuz.jar).
- From the lib folder of your OmniFind installation, (usually /opt/IBM/es/lib on Linux and C:\Program Files\IBM\es\lib on Windows), copy the following four files to this folder: es.federator.jar,es.oss.jar,esapi.jar, and siapi.jar.
Now that all three JDBC driver library files and the four SIAPI library files are
copied to the lib folder, you can run the script that packs the search application. Navigate to the cas2jdbc_tutorial/searchApp folder and run the pack.sh or pack.bat script.
The script packs the sample search application and generates a file named
cas2jdbc_sample_app.ear that you'll later deploy on the
OmniFind machine's WebSphere® Application Server. (Note: The pack script
expects the Java
jarcommand, coming with every JDK installation, to be in the system's PATH.)
Use the following checklist to make sure the previous steps where performed properly and no step was forgotten.
On your OmniFind machine:
- If it's not identical with your DB2 server, you copied the JDBC driver libraries from your DB2 machine to your OmniFind machine in a directory where they are accessible for the esadmin user.
On your DB2 machine:
- You created a system user named "tutorial."
- You created the tutorial database "tutorial" and the creation completed without error messages.
On your local machine:
You modified the cas2jdbc mapping file in the following way:
You adapted the hostname in the
- You checked whether the port 50000 is suitable for your DB2 installation or replaced the port number with the one needed to access the tutorial database.
You adapted all three
<driverLibrary>elements to point to the directory on your OmniFind machine where the JDBC drivers are located.
- You adapted the hostname in the
You prepared the sample search application in the following way:
- You copied the three JDBC driver library files to the lib folder of the sample search application.
- You copied the four SIAPI library files to the lib folder of the sample search application.
- You ran the pack script, which successfully created the EAR file.
- You modified the cas2jdbc mapping file in the following way:
Prepare the tutorial's OmniFind collection
Deploy the custom text analysis engine
To deploy the custom text analysis engine, the following steps must be performed:
- Navigate your browser to the OmniFind administration console (usually http://myHost/ESAdmin) and log in as the Enterprise Search administration user (usually esadmin).
In the top menu, click System:
Figure 3. Click the system menu item
- Click to edit the system's settings.
Navigate to the Parse tab, and click the Configure test analysis engine link:
Figure 4. Configure text analysis engine
- Click to add the tutorial's analysis engine.
Specify tutorial as the analysis engine's name, and locate the PEAR archive
found with the tutorial's resources (placed in cas2jdbc_tutorial/annotator). Click OK when you're done, to start the upload of the custom analysis engine:
Figure 5. Upload the text analysis engine
You should now see the following success message:
Figure 6. Successfully uploaded text analysis engine
Create the tutorial's collection and crawler
Create the collection
After you successfully uploaded the custom text analysis engine, it's time to create an OmniFind collection to process the sample documents coming with the tutorial's resources. Again, log in to the OmniFind system's administration console and perform the following steps:
- On the main page, click to create a new document collection.
Use the following values as parameters:
- Collection name: tutorial_collection
- Collection security: Do not enable security for the collection
- Document importance: Do not apply any static ranking
- Categorization type: none
- Language to use: English
Figure 7. Create collection
Create the crawler
Now that the collection was created, the next step is to create a crawler for this collection:
In the collection overview, click the button in the Crawl column of your collection:
Figure 8. Create crawler
- Switch to the edit mode by clicking .
- Click to create a new crawler.
From the list of available crawlers select Unix file system or Windows file
system according to your machine's operating system. Click Next to continue.
Figure 9. Create crawler continue
On the next page use tutorial_crawler for the crawler's name, an click Next:
Figure 10. Create crawler continue 2
- The next page can be skipped, as you don't want to schedule your crawler, just click to continue.
On this page, you have to specify which documents the crawler should process.
Type the path where you copied the files from the documents folder
of the tutorials resources and click Search for subdirectories.
Add the specified folder to the box Subdirectories to crawl. When you finished, continue by clicking
Figure 11. Create crawler continue 3
- As you don't need any additional settings for this crawler, just complete the crawler creation by clicking .
You should now see the following success message:
Figure 12. Successfully created crawler
Configure the parser
For this tutorial you must configure the parser to do the following:
- Make use of the custom annotator that detects the entities for your sample scenario.
- Apply a custom index mapping so that the OmniFind search index will be enabled for semantic search and you can find the relevant entities by using semantic search queries.
- Upload the database mapping file so that the relevant entities will also be stored in the database and you can establish a relationship between the search results and the data stored in the database.
To associate the collection with the custom annotator, perform the following steps:
Click the button in the Parse column of your collection:
Figure 13. Parse
- Switch to the edit mode, by clicking .
- Click .
Then select the tutorial analysis engine from the list of available custom analysis engines:
Figure 15. Select text analysis engine
Upload the custom index mapping:
On the same page, click Select a mapping file in the Map the common analysis structure to the index section and upload the cas2jdbc_tutorial/cas2index/cas2index.xml file shipped with the tutorial resources:
Figure 16. Upload custom index mapping file
Upload the database mapping file:
Click , in the Map the common analysis structure to a relational database section, and upload the databaseMapping.xml file shipped with the tutorial resources.
Note: Make sure to not mistake with the Map XML elements to the common analysis structure section.
If you get any error messages during the upload, make sure the database server is running and you modified the mapping file correctly (Where the hostname, the port, and the driver libraries specified correctly?).
Run crawler, parser, and indexer
Running the crawler:
Click the button in the tutorial_collection's Crawl column:
Figure 17. Running the crawler
- Click to start the crawler. Note: As you didn't schedule the crawler, you have to start it manually as described in the following steps.
Switch to the Details view:
Figure 18. Crawler details
Click Start a full recrawl to make the parser start immediately:
Figure 19. Start crawl
- The crawler takes a few seconds to process the sample documents, you can track the progress by clicking and watching the progress bar. When the crawler finishes, continue with the next step.
Running the parser:
Click the button in the tutorial_collection collection's Parse column:
Figure 20. Running the parser
- Click to start the parser. In contrast to the crawler, the parser starts working immediately.
- In the Details perspective, you can track the parser's progress. When all documents are parsed, continue with the next step.
Running the indexer:
Click the button in the tutorial_collection collection's Index column:
Figure 21. Running the index build
Start a main index build and wait until the indexer finished processing the documents:
Figure 22. Start the main index build
Run a sample search
Navigate your browser to the OmniFind sample search application (usually http://myHost/ESSearchApplication) and type (or copy and paste) the following sample query:
Make sure to get a list of documents returned, and that each document contains a license plate that is highlighted.
Deploy the tutorial search application
Note: The figures in this section were taken from WebSphere Application Server Version 6.0.2. What you see on your screen may differ according to the version you are running.
- Navigate your browser to the WebSphere Application Server administration console on your OmniFind machine (usually: http://myHost:9060/ibm/console) and log in.
From the menu on the left side, navigate to Applications > Install new
specify the path to the cas2jdbc_sample_app.ear file you created on
your local machine by running the pack script, as described in the setup steps:
Figure 23. Install new application
- On the next page, keep all default values and click .
- The next page shows installation options. Keep the default values here as well. Make sure the check box Pre-compile JSP is unchecked, and click .
On the next page, select the application to be deployed as well on the server server1
as on webserver1. Click Apply to save your changes, and continue by clicking .
Figure 24. Map modules to servers
On the next page, map the tutorial_app Web module to the virtual host
Figure 25. Map virtual hosts for Web modules
- The next page displays a summary of the previously specified deployment parameters. Click to start the deployment process.
The deployment process takes several minutes. and the following success message is
displayed when it is complete:
Figure 26. Application tutorial_appEAR installed successfully
- Click and .
Start the application by navigating to Applications > Enterprise
Applications. from the sidebar menu.
Select tutorial_appEAR from the list of applications, and click Start:
Figure 27. Start application
- The tutorial sample search application is now deployed on your WebSphere Application Server.
Use the tutorial search application
Configure the tutorial search application
Navigate your browser to http://myHost/tutorial_app (where myHost is the name of
your OmniFind machine. The following configuration window opens:
Figure 28. Configure search application
Configure the parameters according to your system, and click Save configuration. You will be redirected to the sample search application.
- If the configuration is not correct, you will see an error page containing information about the error that occurred. Try to understand what went wrong (such as database not accessible, due to wrong port, username or password) and get back to the configuration page to correct the configuration parameters.
- If you get any unusual exceptions (such as the NoClassDefFound error), make sure you copied all the necessary JDBC driver files in their correct version as well as the OmniFind SIAPI libraries to the lib folder before packing an deploying the application.
Use tutorial search application
When everything is configured properly, you should see a page that resembles the following:
Figure 29. The sample search application
- The fields at the top of the page allow you to specify search criteria that will be translated into a OmniFind semantic search query. If you don't specify anything, the query returns all the documents containing at least one license plate annotation. When you click Show reports, the query is generated and executed. The final query string can be seen in the Query box at the bottom of the page. A more detailed description of the different elements is given in the following section.
- The field Police Reports contains a list of all documents matching your query.
- The map contains all cities in the country of tutoria. All cities that were mentioned in one of the police reports from the result list is highlighted in red followed by how often they were mentioned.
- The section License Plate contains information about the license plates and the owner of the particular cars. The information is displayed dynamically when you hover the mouse over one of the red affected cities.
How the tutorial search application works
Figure 30. The tutorial sample search application
- Enter a query, or specify it by selecting values from the drop-down menus.
- The search application creates an OmniFind query string and performs a query against the OmniFind search index.
- OmniFind returns documents matching the query and IDs for the license plates.
- Application uses the documents' URIs and license plate IDs to get the detail information from the database.
The drop-down and text fields on the top of the page are used to build a semantic search query that is issued against the OmniFind search index.
A summary of every search result is displayed in the Police Reports section.
The Query field at the bottom of the page contains the OmniFind XML-fragments query string. The string consists of the following parts:
The header (
@xmlf2::'), which is used to identify an XML fragments query (in contrast to a freetext query).
The header is followed by several
<Entity>XML elements containing text, such as
<City>Cluetown</City>, which means, "give me all documents containing a span of type 'CarMake' or 'City' that contain the specified text."
One entity of these elements additionally has a hash sign ("#") included, the
<#LicensePlate/>element that could also look like this
<#LicensePlate>xyz</#LicensePlate>if you specify additional text in the License Plate field. The additional hash sign indicates that you want to get the licensePlate span's feature structure ID. The feature structure ID, together with the document's URI, can be used to uniquely identify the database record belonging to this span. To understand how this is done, refer to the "Retrieving parts of a document that match a semantic search query" section of the "Text Analysis Integration" handbook that comes with your OmniFind installation. Additionally, you can look at the sample search application's source code found in the Download section of this tutorial. The method
DbUtilities.javademonstrates how to retrieve the feature structure ID of the specified licensePlate target elements, and uses it together with the document's URI to retrieve the car owner information for this license plate from the database.
Furthermore, you might see some query elements like
#year::>=1999. Note that these are not XML fragments. Don't be confused by the hash sign. These are so called fielded search terms that allows you to search for documents containing a field that contains a certain value. The most interesting thing about fielded search is that it allows you to specify a certain value range that you want the value of the field to be in, for example "give me all documents that where written after 1999 but before 7/8/2001." The "Query syntax" section of the IBM OmniFind Enterprise Edition information center is a good starting point to become familiar with fielded search and learn more about the difference between semantic search and fielded search and how they can be combined.
- The last term in the generated query always contains the text you entered in the Freetext field of the search application, and that's exactly how OmniFind will interpret it: "Give me all documents that contain this text."
The map contains information that is retrieved from both the OmniFind search index and the database. First the list of all existing cities is retrieved from the database. Next, all cities are flagged that are mentioned in the search results' police reports. This is done by taking every result's document URI and selecting the cities from the database table that where found within this document.
The most interesting feature of this application might be the car owner information that is displayed when you hover the mouse over one of the red marked cities. The feature structure IDs of the previously explained license plate XML target elements are used together with the document's URI to select the license plate and the corresponding owner information from the database. Note: To keep things simple, one and the same database is used here to store the license plate annotations and the owner information. For a real business application, you could imagine that the detail information (like in this case the car owner information) can be retrieved from a different data source such as a different database or also from a complex ERP or CRM system.
Understand the search application's code
The sample search application is provided with its source code so that you can have a look at it and get a better understanding of how to write a custom search application yourself. Note: Some parts of the code may not be "state of the art" but for the sake of the example it was attempted to keep the source as simple as possible.
In the source code you will find three packages:
com.ibm.es.cas2jdbc.tutorialcontains the simple entity classes such as, "City" and "PoliceReport." The most interesting class in this package is the
SearchProvider, as it demonstrates the usage of the SIAPI.
- The package
com.ibm.es.cas2jdbc.tutorial.servletcontains the three servlets to manage the configuration, retrieve the license plate information, and handle the query. The core entry point here is the SearchServlet. It demonstrates how to build the OmniFind query string from the parameters specified in the GUI, and makes use of the utility class
DbUtilitiesthat is responsible for retrieving all information stored in the database.
com.ibm.es.cas2jdbc.tutorial.utilpackage contains the previously mentioned
DbUtilitiesclass that is responsible for doing all database interaction. Have a look at the source code to find out what SQL queries are used.
The three jsp files (configure.jsp, map.jsp, and error.jsp) are of minor interest. They do not contain much logic, but are only responsible for displaying the results or errors.
If you still haven't gotten enough, you can have a look at the source code of the sample
annotators found in the Download section of this tutorial
(You can find them in the pear archive that can
be extracted like an ordinary ZIP file.).
Another thing you could do is having a look at the recommended tutorials listed in the introduction. They are a great starting point to learn more about OmniFind's search capabilities, how to write your own UIMA annotators, and use the SIAPI to build powerful custom search applications.
Anything else to be done?
Yes. What you really should do after you've completed this tutorial is clean up your system. Of course, you can keep the collection and the database, but what you really should do is removing the system user used for the DB2 connection (Otherwise, you'll have a user named "tutorial" with the password "password" on your machine, which is a potential security hole). If you also want to remove the database, you can use the provided dropDB2.ddl script.
I hope you have enjoyed this tutorial and could learn all the things you need build your own applications using OmniFind, Cas2Jdbc, and SIAPI.
|Helpful files for this tutorial||cas2jdbc_tutorial.zip||311KB|
- "Store selected OmniFind analysis results to a relational database for reporting and data mining" (developerWorks, November 2006): In Part 1 of this tutorial series, learn how to use Cas2Jdbc, the part of OmniFind that allows you to store selected text analysis results in a relational database.
- "Semantic search in WebSphere Information Integrator OmniFind Edition: Deploy a semantic search solution" (developerWorks, August 2005): Configure and deploy a semantic search solution using WebSphere Information Integrator OmniFind Edition 8.2.2.
- "IBM Search and Index APIs (SIAPI) for WebSphere Information Integrator OmniFind Edition" (developerWorks, January 2006): Learn the concepts behind OmniFind's API set and explore examples of how to use them in depth.
- The "Retrieving parts of a document that match a semantic search query" chapter of the IBM OmniFind Enterprise Edition information center: Learn to retrieve just the parts of a document that match the query exactly.
- The "Query syntax" section of the IBM OmniFind Enterprise Edition information center: Refine search results by using specific characters in a query.
- developerWorks Information Management zone: Learn more about DB2. Find technical documentation, how-to articles, education, downloads, product information, and more.
- Stay current with developerWorks technical events and webcasts.
Get products and technologies
- Build your next development project with IBM trial software, available for download directly from developerWorks.
- Participate in the discussion forum.
- Participate in developerWorks blogs and get involved in the developerWorks community.
Dig deeper into Information management on developerWorks
Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.
Keep up with the best and latest technical info to help you tackle your development challenges.
Software development in the cloud. Register today to create a project.
Evaluate IBM software and solutions, and transform challenges into opportunities.