Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Create a UIMA component Web service, Part 1: Create a UIMA application using Eclipse

Use wizards to simplify component creation

Nicholas Chase (ibmquestions@nicholaschase.com), Freelance writer, Backstop Media
Nicholas Chase has been involved in Web site development for companies such as Lucent Technologies, Sun Microsystems, Oracle, and the Tampa Bay Buccaneers. Nick has been a high school physics teacher, a low-level radioactive waste facility manager, an online science fiction magazine editor, a multimedia engineer, an Oracle instructor, and the chief technology officer of an interactive communications company. He is the author of several books, including XML Primer Plus (Sams).

Summary:  Search word processing documents, emails, video, and other unstructured information for specific text or even for concepts using the Unstructured Information Management Architecture (UIMA). Part 1 of this tutorial explains how to install and use the UIMA Eclipse plug-ins to create a simple UIMA application.

View more content in this series

Date:  28 Jul 2005
Level:  Intermediate PDF:  A4 and Letter (1773 KB | 48 pages)Get Adobe® Reader®

Activity:  8582 views
Comments:  

Create an application

At this point you leave the realm of creating classes and components for the UIMA and begin creating applications that simply use those classes and components.

Create the class

You can create a plain old Java class that loads the Analysis Engine, instructed to process the document, and then extracts information from the resulting CAS object, outputting it to the command line. That might not sound very impressive, but it is the heart of what any UIMA application does: process the data and examine the results.

Start by creating the new Java class:

  1. In the Package Explorer pane, right-click the src folder and select New > Class.
  2. Choose the same package you used for the ProductNumber class. This is not required; it is merely convenient.
  3. Choose a class name. Because this is the final application, this name is truly arbitrary. I use ProductFinder in these examples.

Now let's add some code.

Create the class

Once Eclipse creates the new class, add the following code. See Listing 8.


Listing 8. Creating the ProductFinder class
                    
package com.backstopmedia.uima.tutorial;

import java.io.File;
import java.io.FileInputStream;
import com.ibm.uima.UIMAFramework;
import com.ibm.uima.analysis_engine.TextAnalysisEngine;
import com.ibm.uima.cas.FSIterator;
import com.ibm.uima.cas.FeatureStructure;
import com.ibm.uima.cas.Type;
import com.ibm.uima.cas.text.TCAS;
import com.ibm.uima.resource.ResourceSpecifier;
import com.ibm.uima.util.XMLInputSource;

public class ProductFinder {

   public static void main(String[] args) {

      try {
         File taeDescriptor = new File("C:\\uima1.2.1\\docs\\example
s\\descriptors\\ProductNumberAEDescriptor.xml");
         File inputFile = new File("C:\\uima1.2.1\\docs\\examples\\d
ata\\October Survey Report.txt");

     }  catch(Exception e) {
         e.printStackTrace();
     }    
  }

}

Just as in the case of the CAS Visual Debugger, you need to specify the Analysis Engine descriptor and the file to be analyzed. Notice that these are absolute locations. Make sure to specify the actual locations is your installation.


Create the Analysis Engine

The first step is to actually create the Analysis Engine:


Listing 9. Creating the Analysis Engine
                    
...
      try {
         File taeDescriptor = new File("C:\\uima1.2.1\\docs\\example
s\\descriptors\\ProductNumberAEDescriptor.xml");
         File inputFile = new File("C:\\uima1.2.1\\docs\\examples\\d
ata\\October Survey Report.txt");

         XMLInputSource in = new XMLInputSource(taeDescriptor);
         ResourceSpecifier specifier = 
           UIMAFramework.getXMLParser().parseResourceSpecifier(in);
           
         TextAnalysisEngine tae = UIMAFramework.produceTAE(specifier);
         
         tae.destroy();
     }  catch(Exception e) {
         e.printStackTrace();
     }    
...

First create a new XMLInputSource to represent the descriptor file. From there, you can use the UIMA framework itself to read that file for information on the Analysis Engine you're trying to create. Once you have the specifier for the engine, you can use it to create the actual TextAnalysisEngine object.

Finally, when all is said and done, you should destroy the TextAnalysisEngine to free up the memory it occupied.


Process the document

Once you have the Analysis Engine, you can actually process the document, as shown in Listing 10.


Listing 10. Processing the document
                    
...
public class ProductFinder {

   public static void main(String[] args) {

      try {
         File taeDescriptor = new File("C:\\uima1.2.1\\docs\\example
s\\descriptors\\ProductNumberAEDescriptor.xml");
         File inputFile = new File("C:\\uima1.2.1\\docs\\examples\\d
ata\\October Survey Report.txt");

         XMLInputSource in = new XMLInputSource(taeDescriptor);
         ResourceSpecifier specifier = 
           UIMAFramework.getXMLParser().parseResourceSpecifier(in);
           
         TextAnalysisEngine tae = UIMAFramework.produceTAE(specifier);
         TCAS tcas = tae.newTCAS();
               
         FileInputStream fis = new FileInputStream(inputFile);
         byte[] contents = new byte[(int)inputFile.length()];
         fis.read( contents );      
         fis.close();
         String document =new String(contents );
         
         tcas.setDocumentText(document);
         tae.process(tcas);
         
         tae.destroy();
     }  catch(Exception e) {
         e.printStackTrace();
     }    
  }

}

The first step is to obtain a new CAS object from the engine. It is this CAS object that will receive any Annotations discovered for this document. Next, get the contents of the actual file as a string.

Remember, the CAS object contains not just the Annotations, but the data itself. Set that data in the CAS object using the setDocumentText() method.

Finally, feed the newly populated CAS object to the process() method. This method searches the data and adds any Annotations to the CAS object.

That takes care of getting the data in. Now you have to get it out again.


Get the Annotations

Using the classes provided in the UIMA framework and the classes you generated earlier, you can directly access the information in the newly populated CAS object. See Listing 11.


Listing 11. Retrieving the Annotations
                    
...
public class ProductFinder {

   public static void printProducts(TCAS tcas) {       
          
      Type productType = tcas.getTypeSystem()
          .getType("com.backstopmedia.uima.tutorial.ProductNumber");
      System.out.println("Type is " + productType.getName() + ".");
      System.out.println("It has " + productType.getNumberOfFeatures() 
                                                     + " features.");
          
      FSIterator iter = 
                   tcas.getAnnotationIndex(productType).iterator();
      while (iter.isValid()) {
         FeatureStructure fs = iter.get();

         ProductNumber annot = (ProductNumber)fs;
            
         iter.moveToNext();
      }
   }    

   public static void main(String[] args) {

      try {
...
         tcas.setDocumentText(document);
         tae.process(tcas);
         
         printProducts(tcas);
         
         tae.destroy();
     }  catch(Exception e) {
         e.printStackTrace();
     }    
  }

}

First, in the printProducts() method, get a feel for how things are working by obtaining a reference to the definition of the ProductNumber type by extracting it from the CAS object. You can then output attributes such as the name and number of features to the command line.

But the real task is to see the data that's in the CAS object. To do that, you can obtain a FSIterator object to iterate over the feature structures present. Once you have that, you can loop through each item in the iterator, each time retrieving the current FeatureStructure and casting it as a ProductNumber Annotation.

If you run this application, you should see the following type information:

Type is com.backstopmedia.uima.tutorial.ProductNumber.
It has 4 features.


Get the Annotation features

Once you have the Annotations, you can get at their data, shown in Listing 12.


Listing 12. Extracting the Annotation features
                    
...
      FSIterator iter = tcas.getAnnotationIndex(productType).iterator();
      while (iter.isValid()) {
         FeatureStructure fs = iter.get();

         ProductNumber annot = (ProductNumber)fs;
         String coveredText = annot.getCoveredText();
         System.out.println("The product number is " + coveredText);
         System.out.println("The product line is " + 
                                            annot.getProductLine());
         System.out.println("Annotation found from " + 
                  annot.getStart() + " to " + annot.getEnd() + ".");         
         System.out.println("");
            
         iter.moveToNext();
      }
   }    
...

Remember when you created the ProductNumber class? It had getters and setters for the productLine and other information such as the start and end positions. Now you can make use of those methods to retrieve the actual information. You can also retrieve the data being annotated using the getCoveredText() method.

Now let's run it.


Run the application

Running an application in Eclipse is fairly straightforward. Right-click the appropriate .java file -- in this case, ProductFinder.java -- and choose Run As > Java Application.

The results appear in the Console window, which appears below the editors (unless you've moved it, of course). You should see results similar to Figure 18.


Figure 18. The final results
The final results

If there are any run-time errors, they also appear in this window.

And that's all there is to it.

9 of 12 | Previous | Next

Comments



Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=SOA and Web services, Open source, Java technology, Information Management
ArticleID=137832
TutorialTitle=Create a UIMA component Web service, Part 1: Create a UIMA application using Eclipse
publish-date=07282005
author1-email=ibmquestions@nicholaschase.com
author1-email-cc=troy@backstopmedia.com, webserv@us.ibm.com

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.