IBM Content Analytics with Enterprise Search, Version 3.0.0                  

Custom text analysis integration

After you build your custom analysis by using the Unstructured Information Management Architecture (UIMA), you can integrate the analysis logic by using the IBM® Content Analytics with Enterprise Search administration console.

UIMA is an open platform that identifies components for each conceptually distinct analysis function, and it ensures that these components can be easily reused and combined.

Advanced linguistic analysis can include a combination of many different analysis tasks. The analysis begins with language detection and segmentation, and continues with part-of-speech recognition, followed by deep grammatical parsing. The last tasks include identifying, for example, the relation between certain chemical substances and the appearance of particular symptoms. Each step in the analysis process depends on the results of the previous step.

The analysis logic for each step is contained in an annotator. Annotators combine to form a processing chain that iterates over each document in the collection to discover new information and store this information for downstream processing.

The annotators that are responsible for discovering and representing analysis content in text documents are contained in an analysis engine, a central concept in UIMA. An analysis engine might contain a single annotator or it might be a composite of many engines, each in turn containing annotators.

UIMA only provides the basic building blocks for you to create, test, and deploy your own analysis engines. It does not provide you with any linguistic analysis functionality in the form of pre-configured analysis engines that you can deploy in your UIMA environment. However, the linguistic processing that is applied is available as a set of annotators that you can work with in UIMA.

To work with UIMA, you can install the Apache UIMA Software Development Kit (SDK), Version 2.3. The SDK is available on the Apache UIMA site. The UIMA SDK includes a Java implementation of the UIMA framework for the implementation, description, composition and deployment of UIMA components.

The UIMA SDK also provides a set of tools and utilities for working with UIMA in an Eclipse-based development environment (Eclipse plug-ins). For information about Eclipse and to download the software, see the Eclipse site. For instructions on how to install the UIMA SDK in the Eclipse Interactive Development Environment, see the UIMA documentation.

As an alternative to manually developing annotators with the UIMA SDK, you can use Content Analytics Studio to easily develop and deploy custom text analytics for IBM Content Analytics with Enterprise Search applications. Content Analytics Studio is a separately installable component of IBM Content Analytics with Enterprise Search.


Feedback

Last updated: May 2012

© Copyright IBM Corporation 2004, 2012.
This information center is powered by Eclipse technology. (http://www.eclipse.org)