UIMA concepts

InfoSphere™ Warehouse provides text-analysis functions that are based on the Unstructured Information Management Architecture (UIMA). You can use these text-analysis functions, or you can use Apache UIMA-compliant text-analysis components from third parties, for example, from IBM® business partners or academia, in the InfoSphere Warehouse.

Unstructured information represents the largest, the most current, and the fastest growing source of information that is available to businesses and governments. There are mounds of information that are hosted in enterprises across different media, for example, text, voice, or video. With an unstructured information management (UIM) application, you can analyze large volumes of unstructured information to discover, organize, and deliver relevant knowledge to decision makers.

Unstructured data must be analyzed to interpret, detect, and locate concepts of interest that are not explicitly tagged or annotated in the original document. For example, documents might include the following domain-specific information:
Named entities
Named entities can be persons, organizations, locations, facilities, or products.
Opinions
Opinions can be complaints, threats, or facts.
Relations
Relations can be located in finances, supports, purchases, or repairs

The results of analyses must be put in structured forms so that powerful data-mining techniques and search technologies such as search engines, database engines, On-Line Analytical Processing (OLAP) tools, or Data Mining engines can be leveraged to efficiently find the concepts you need, when you need them.

By analyzing unstructured content, UIM applications make use of a variety of analysis technologies including, for example:

These technologies are developed independently by highly specialized scientists and engineers who use different techniques, interfaces, and platforms.

The bridge from the unstructured world to the structured world is built through the composition and deployment of these analysis capabilities. The Unstructured Information Management Architecture (UIMA) is an architecture and software framework that helps you build that bridge. It supports creating, discovering, composing, and deploying a broad range of analysis capabilities and linking them to structured information services.

Figure 1. UIMA helps you to build the bridge between unstructured and the structured world
The graphic illustrates how UIMA can help to bridge the unstructured world with the structured world

UIMA specifies component interfaces, data representations, design patterns, and development roles for creating, describing, discovering, composing, and deploying analysis capabilities.

The UIMA framework provides a run-time environment in which developers can plug in their UIMA component implementations and with which they can build and deploy UIM applications. The framework is not specific to any IDE or platform.

InfoSphere Warehouse uses the UIMA Software Development Kit (SDK) that is available at the following website:
http://incubator.apache.org/uima/
The UIMA SDK is a Java implementation of the UIMA framework. You can load your own UIMA compliant text-analysis modules and run them inside the InfoSphere Warehouse.


Feedback | Information roadmap