IBM® Watson Content Analytics Version 3.5 introduces many new features for planners, administrators, users, and application developers.
The product was renamed from IBM Content Analytics with Enterprise Search to IBM Watson Content Analytics.
IBM Content Analytics Studio was renamed IBM Watson Content Analytics Studio. In the product documentation and user interfaces, this component is labeled ICA Studio.
User applications that you created in Watson Content Analytics Version 3.0 are supported in Version 3.5, but they are not migrated to the new application framework. The Version 3.0 content analytics miner and enterprise search application are deprecated and will not be supported in future releases. Applications that are based on the Version 3.0 framework cannot use the new features that are available in Version 3.5, such as allowing users to customize the layout of the user interface, create rule-based alerts, or exclude unimportant text from the search results.
Watson Content Analytics Version 3.5 Enhancements | |
---|---|
Architectural enhancements | |
64-bit crawler processes | All crawlers now run as 64-bit processes on
all supported operating systems. For some
crawlers, restrictions apply:
|
Scalability and multiple-node indexing | For high scalability and failover support, your Watson Content Analytics system can now include
multiple index servers and search servers.
Collections can be shared across multiple
partitions. A collection that includes multiple partitions
concurrently processes the index across the multiple
servers. Distributed search servers process
search requests over multiple servers, and then
federate the search results from other partitions. To provide greater flexibility when you add servers to your system topology, you can select a role that combines functions. You can configure servers to support both document processing and search, and configure servers to support both indexing and search. |
IBM BigInsights™ | Enhancements extend indexing support to the Hadoop Distributed
File System (HDFS). HDFS folders can be
scanned, and references to files in the
folders can be imported to the index. Unlike a crawler, this scan
does not detect file modification dates.
All files are renewed in the index with
each scan. This function provides a plug-in interface that runs in Map Reduce processes on Hadoop. The ability to store data in HDFS or HBase is not provided, but you can write a plug-in to implement this capability. |
Social analytics and social search | |
New social media crawler | If you have a BoardReader license, you can configure a BoardReader crawler to collect content from blogs, message board sites and forums, news sites, reviews, and videos. Because all information on social media sites is presumed to be public, secure search is not supported. |
Enhanced social media crawlers | The FileNet P8 crawler, SharePoint crawler, and Seed list crawler for IBM Connections can collect social data. The enhancements allow users to explore relationships between people, recommendations, tags that are associated with documents added to a collection by these crawlers. |
Social search | If social search is enabled for a collection, the system can aggregate information from various social networking sources and extract relationships between documents, people, and tags. For example, users can discover people who are relevant to a document; see recommendations for other documents and people that might be of interest; and drill down through a tag cloud to explore related information. |
User experience | |
Redesigned user interfaces | The content analytics miner and enterprise search applications were redesigned to provide a common user experience that matches other IBM products. The redesign includes performance enhancements, functional enhancements, and usability enhancements. |
Layout customization | You can easily change the appearance and widgets in the application
interfaces by selecting a predefined layout
option. For example, in the enterprise
search application you can select a faceted search layout that
includes time series analysis and correlation
analysis. You can also create your own
layouts based on templates such as a three-column page
design or a two-row page design. You can also customize the layout by specifying the location and size of each widget pane that you want to view in the interface. For example, you can show a facet tree in the upper right pane and a time series chart in the lower left pane. After you add a widget, you can drag it to another location as you refine your design. You can also configure the default settings for each widget, such as the maximum number of facets to show in the facet tree. You can easily share customized layouts by exporting and importing the customization layout file. |
Mobile devices | You can open the enterprise search application and the Dashboard view in the content analytics miner on an Apple iPad device (iOS 7 is required). Navigation features include touch scrolling, large tappable targets, and orientation awareness. |
Collection modeling | |
Solution templates and packages | To help you get started with searching and analyzing content,
you can create collections that are based
on predefined configuration settings and
resource definitions. For example, a solution template might include
settings for extracting information from
text (such as dictionaries) and settings
for organizing the data for retrieval (such as field mappings
and categories). A solution package can include one or more solution templates plus other configuration settings, such as whether security is enabled. A package can also specify a layout definition to control how the information is to be displayed when users query the collection. When you create a collection, you select the solution package that you want to use. If the package includes sample data, parsing and indexing begin as soon as the collection is created. You can also create and distribute solution packages that are based on existing collections. For example, you can convert a collection to a solution template so that you can reuse resource definitions in other collections. You can also export solution packages and import them on other Watson Content Analytics servers. |
Analytics and Search | |
RDF Triplestore | Watson Content Analytics supports
the Resource Description Framework (RDF),
which is an Internet standard that allows
structured and semi-structured data to be mixed and shared across
applications and the web. Support is provided
in the following ways:
|
Include PEAR files in ICA Studio pipelines | An ICA Studio pipeline can include annotators that are packaged as UIMA PEAR files, including PEAR files that are created outside ICA Studio. The PEAR files are run at the appropriate point in the UIMA pipeline. |
Export ICA Studio pipeline to multiple collections | You can select multiple collections when you export an ICA Studio pipeline to Watson Content Analytics. The collections must be of the same type and on the same server. With this enhancement, the pipeline is downloaded and installed one time, and the same field and facet information is configured on all selected collections. |
Rule-based alerts | To better automate analysis and integrate it into your business
processes, you can configure alerts that
cause actions to be taken when the specified
conditions are met. For example, an alert might be
triggered when the number of documents in the search results
exceeds a threshold. For another example,
an alert might be triggered when the trends
index shown for a certain query exceeds the trends index from the
last time the query was run by a certain
percentage. You can choose to receive an email notification when the alert is triggered, save the results of the alert in an XML file, or implement a custom publishing policy. For example, your custom plug-in might cause a new case to be created in IBM Case Manager. |
Compound document support | If a document contains multiple parts, such as attachments
or content elements, you can configure
the following crawlers to search and return
all parts of the document as a single document in the search
results:
|
Natural language processing and search enhancements | |
Overlay index for excluding text | To improve search quality, administrators and content analytics
miner users can identify unimportant phrases
and specify that the text is to be ignored
by the search processes. For example, text that appears
throughout a set of documents, such as "IBM Press
Release", becomes meaningless if potentially
all documents can be returned. Excluded text is stored in an overlay index that an administrator applies to the main index. A query for an excluded term returns no documents and no facets. If another query returns a document that includes the excluded text, the excluded text is shown as light gray text in the document summaries. |
Search quality management | Through the use of enhanced natural language query processing,
Watson Content Analytics more
effectively extracts concepts from content.
By analyzing queries in context, more intelligent
query modifications can be suggested and more relevant
results can be obtained. For example, documents
in which a word is used as a noun might
be ranked higher than documents in which the word is used as a verb. A new dashboard in the administration console lets you configure options for managing search quality. You can configure global settings for searching content and ranking results, and configure settings for specific queries and groups of queries. For example, you can associate custom dictionaries at the server level or with specific queries, refine results by applying pattern matching rules, and refine results by applying a system text analysis engine. |
Sentiment analysis | For English and Japanese, deep parsing processes more precisely identify sentiment expressions by analyzing the grammatical structure of entire sentences, including the ability to parse predicates and arguments in conversational context. Additional enhancements extend support for shallow sentiment analysis to the following languages: Chinese, Czech, Dutch, Hebrew, Russian, Spanish, and Turkish. |
Named entity recognition for Chinese | The Named Entity Recognition (NER) annotator includes enhancements for the Chinese language. Improvements include performance, enhanced part-of-speech analysis, and the ability to add and block entities by configuring the annotator in the administration console. |
Additional language support | Enhancements were made to support Korean and Turkish in content
analytics collections. ICA Studio was also
enhanced to support these languages. In enterprise search collections, enhancements were made to support Thai and Turkish dictionary-based segmentation. |