Text is usually grouped in a category, for example, Unstructured data, together with video tapes, images, audio tapes, and other information that is stored with little or no subdivisions at all into different fields. Though you can also apply the concepts of the Unstructured Information Management Architecture (UIMA) to audio tapes or video tapes, unstructured information in the InfoSphere™ Warehouse typically refers to text strings such as call-center notes, customer-satisfaction surveys, or problem reports.
With the existing data warehousing tools, you cannot efficiently use this information to create insight. While there is huge insight in structured data, insight in unstructured data is lost because it is difficult to identify in the vast amount of unstructured data that is used by organizations to run their business.
With information retrieval, you can locate individual documents that deal with a specific problem. With business intelligence, you can aggregate information to detect patterns and trends.
The goal of text analysis is to transform unstructured information into a structure that can be analyzed in the InfoSphere Warehouse together with existing structured information by using data warehousing tools, for example, reporting tools, tools for multidimensional analysis, or data mining tools. You can create this structure by determining concepts that are included in the text, and by extracting these concepts into relational tables.
You can apply text analysis or a combined analysis of structured and unstructured data in all industries and across many business functions. The following examples illustrate the concepts:
It is expensive to win new customers. Therefore, customers must not be attracted by competitors. With the data-mining technique Predictive modeling, you can predict for individual customers the propensity to cancel their contracts.
Predictive modeling is based on available data about each customer and on historic cases of customers who have left your company.
In a traditional data-mining model, only structured data about customers is used. For example:
With text analysis, you can extract the most important concepts that are mentioned by customers during a call to the call center. For example, the following concepts might be recorded by call-center staff in their notes:
By using the information that can be extracted from call-center notes as additional input to the data-mining prediction-model, you can considerably improve the predictive power of the data-mining model.
Repair shops that are accredited dealers for a specific brand might provide most of the repair reports in a structured form, for example, part types or standard codes for standard services. This information is already used today to find sequential patterns of part-failure sequences. However, nonstandard cases are reported in free-text fields. Extracting part types and problem types from these free-text fields and including them in the causal and sequence analysis improves the manufacturers ability to react to detect quality problems earlier and avoid costly product recalls.
With text analysis, the following keywords might be detected in the records of the patients:
smoker, physical inactivity, alcoholism, obesity
You can considerably improve association models and classification models by including the keywords that are retrieved by using text analysis on these models. For example, based on the analysis of structured and unstructured data, 40% of the patients might be eligible to be exempted from further intensive and expensive medical supervision and control. This result cannot be achieved if you use structured information only.