Create an application
At this point you leave the realm of creating classes and components for the UIMA and begin creating applications that simply use those classes and components.
You can create a plain old Java class that loads the Analysis Engine, instructed to process the document, and then extracts information from the resulting CAS object, outputting it to the command line. That might not sound very impressive, but it is the heart of what any UIMA application does: process the data and examine the results.
Start by creating the new Java class:
- In the Package Explorer pane, right-click the src folder and select New > Class.
- Choose the same package you used for the
ProductNumberclass. This is not required; it is merely convenient. - Choose a class name. Because this is the final application, this name is truly arbitrary. I use
ProductFinderin these examples.
Now let's add some code.
Once Eclipse creates the new class, add the following code. See Listing 8.
Listing 8. Creating the
ProductFinder class
package com.backstopmedia.uima.tutorial;
import java.io.File;
import java.io.FileInputStream;
import com.ibm.uima.UIMAFramework;
import com.ibm.uima.analysis_engine.TextAnalysisEngine;
import com.ibm.uima.cas.FSIterator;
import com.ibm.uima.cas.FeatureStructure;
import com.ibm.uima.cas.Type;
import com.ibm.uima.cas.text.TCAS;
import com.ibm.uima.resource.ResourceSpecifier;
import com.ibm.uima.util.XMLInputSource;
public class ProductFinder {
public static void main(String[] args) {
try {
File taeDescriptor = new File("C:\\uima1.2.1\\docs\\example
s\\descriptors\\ProductNumberAEDescriptor.xml");
File inputFile = new File("C:\\uima1.2.1\\docs\\examples\\d
ata\\October Survey Report.txt");
} catch(Exception e) {
e.printStackTrace();
}
}
}
|
Just as in the case of the CAS Visual Debugger, you need to specify the Analysis Engine descriptor and the file to be analyzed. Notice that these are absolute locations. Make sure to specify the actual locations is your installation.
The first step is to actually create the Analysis Engine:
Listing 9. Creating the Analysis Engine
...
try {
File taeDescriptor = new File("C:\\uima1.2.1\\docs\\example
s\\descriptors\\ProductNumberAEDescriptor.xml");
File inputFile = new File("C:\\uima1.2.1\\docs\\examples\\d
ata\\October Survey Report.txt");
XMLInputSource in = new XMLInputSource(taeDescriptor);
ResourceSpecifier specifier =
UIMAFramework.getXMLParser().parseResourceSpecifier(in);
TextAnalysisEngine tae = UIMAFramework.produceTAE(specifier);
tae.destroy();
} catch(Exception e) {
e.printStackTrace();
}
...
|
First create a new XMLInputSource to represent the descriptor file. From there, you can use the UIMA framework itself to read that file for information on the Analysis Engine you're trying to create. Once you have the specifier for the engine, you can use it to create the actual TextAnalysisEngine object.
Finally, when all is said and done, you should destroy the TextAnalysisEngine to free up the memory it occupied.
Once you have the Analysis Engine, you can actually process the document, as shown in Listing 10.
Listing 10. Processing the document
...
public class ProductFinder {
public static void main(String[] args) {
try {
File taeDescriptor = new File("C:\\uima1.2.1\\docs\\example
s\\descriptors\\ProductNumberAEDescriptor.xml");
File inputFile = new File("C:\\uima1.2.1\\docs\\examples\\d
ata\\October Survey Report.txt");
XMLInputSource in = new XMLInputSource(taeDescriptor);
ResourceSpecifier specifier =
UIMAFramework.getXMLParser().parseResourceSpecifier(in);
TextAnalysisEngine tae = UIMAFramework.produceTAE(specifier);
TCAS tcas = tae.newTCAS();
FileInputStream fis = new FileInputStream(inputFile);
byte[] contents = new byte[(int)inputFile.length()];
fis.read( contents );
fis.close();
String document =new String(contents );
tcas.setDocumentText(document);
tae.process(tcas);
tae.destroy();
} catch(Exception e) {
e.printStackTrace();
}
}
}
|
The first step is to obtain a new CAS object from the engine. It is this CAS object that will receive any Annotations discovered for this document. Next, get the contents of the actual file as a string.
Remember, the CAS object contains not just the Annotations, but the data itself. Set that data in the CAS object using the setDocumentText() method.
Finally, feed the newly populated CAS object to the process() method. This method searches the data and adds any Annotations to the CAS object.
That takes care of getting the data in. Now you have to get it out again.
Using the classes provided in the UIMA framework and the classes you generated earlier, you can directly access the information in the newly populated CAS object. See Listing 11.
Listing 11. Retrieving the Annotations
...
public class ProductFinder {
public static void printProducts(TCAS tcas) {
Type productType = tcas.getTypeSystem()
.getType("com.backstopmedia.uima.tutorial.ProductNumber");
System.out.println("Type is " + productType.getName() + ".");
System.out.println("It has " + productType.getNumberOfFeatures()
+ " features.");
FSIterator iter =
tcas.getAnnotationIndex(productType).iterator();
while (iter.isValid()) {
FeatureStructure fs = iter.get();
ProductNumber annot = (ProductNumber)fs;
iter.moveToNext();
}
}
public static void main(String[] args) {
try {
...
tcas.setDocumentText(document);
tae.process(tcas);
printProducts(tcas);
tae.destroy();
} catch(Exception e) {
e.printStackTrace();
}
}
}
|
First, in the printProducts() method, get a feel for how things are working by obtaining a reference to the definition of the ProductNumber type by extracting it from the CAS object. You can then output attributes such as the name and number of features to the command line.
But the real task is to see the data that's in the CAS object. To do that, you can obtain a FSIterator object to iterate over the feature structures present. Once you have that, you can loop through each item in the iterator, each time retrieving the current FeatureStructure and casting it as a ProductNumber Annotation.
If you run this application, you should see the following type information:
Type is com.backstopmedia.uima.tutorial.ProductNumber. It has 4 features. |
Once you have the Annotations, you can get at their data, shown in Listing 12.
Listing 12. Extracting the Annotation features
...
FSIterator iter = tcas.getAnnotationIndex(productType).iterator();
while (iter.isValid()) {
FeatureStructure fs = iter.get();
ProductNumber annot = (ProductNumber)fs;
String coveredText = annot.getCoveredText();
System.out.println("The product number is " + coveredText);
System.out.println("The product line is " +
annot.getProductLine());
System.out.println("Annotation found from " +
annot.getStart() + " to " + annot.getEnd() + ".");
System.out.println("");
iter.moveToNext();
}
}
...
|
Remember when you created the ProductNumber class? It had getters and setters for the productLine and other information such as the start and end positions. Now you can make use of those methods to retrieve the actual information. You can also retrieve the data being annotated using the getCoveredText() method.
Now let's run it.
Running an application in Eclipse is fairly straightforward. Right-click the appropriate .java file -- in this case, ProductFinder.java -- and choose Run As > Java Application.
The results appear in the Console window, which appears below the editors (unless you've moved it, of course). You should see results similar to Figure 18.
Figure 18. The final results
If there are any run-time errors, they also appear in this window.
And that's all there is to it.

