Advancing Unstructured Data
Advanced technologies for managing unstructured data have been available in the software industry for quite some time, however, over the past year, the topic of unstructured data has become more visible as unstructured data handling capabilities show up in mainstream products. Take the newest version of DB2 code named Viper which has the best support for storing, searching and retrieving unstructured data for any commercial DBMS. If you have requirements for efficiently managing XML in the context of your database projects, DB2 Viper is your best choice.
Beyond XML, which many would call semi-structured data, unstructured data comes in many forms inside the enterprise including but not limited to call center logs, web forms, health records, documents, emails, and business applications with text fields, etc. This unstructured data can also be held in various language formats and in addition can potentially be stored as audio and even video. Many analyst cite the figure of 80% as the amount of unstructured data generated and stored in the enterprise.
During visits with our customers over the past year, many businesses indicated that they are struggling with the topic of unstructured data. Several indicated that they were generally unaware of some of the technologies available for handling unstructured data. In one situation, a customer had 4 users manually sifting through call center logs to identify any potential common issues in their customer interactions. Manually managing this process is both expensive and time consuming. That's the bad news, the good news is that technologies have advanced enough that this process could easily be automated, lowering the cost while being more responsive to the needs of the business and customers.
IBM has invested significantly in the area of unstructured data over many years. One particular project that is showing great promise in its ability to handle a wide variety of unstructured data problems is UIMA short for Unstructured Information Management Architecture. UIMA is now an open source software framework for assisting both organizations and software companies with support for tackling a wide variety of unstructured data problem. Already, many of IBM's partners have announced support for UIMA including Attensity Software, Clear Forest, SPSS, SAS and others. Having the solution in open source will only serve to accelerate the adoption of UIMA. In addition technologists and researchers who have a passion for this topic area will have place to exercise their skills and expertise.
Over time, I predict that UIMA will see significant adoption as this framework has the potential to really improve how businesses can break down the barriers to discovering the proverbial "needle in the haystack" in their ever expanding amount of unstructured data.