What's new in Version 3.5

IBM® Watson Content Analytics Version 3.5 introduces many new features for planners, administrators, users, and application developers.

Terminology

The product was renamed from IBM Content Analytics with Enterprise Search to IBM Watson Content Analytics.

IBM Content Analytics Studio was renamed IBM Watson Content Analytics Studio. In the product documentation and user interfaces, this component is labeled ICA Studio.

Deprecated user applications

User applications that you created in Watson Content Analytics Version 3.0 are supported in Version 3.5, but they are not migrated to the new application framework. The Version 3.0 content analytics miner and enterprise search application are deprecated and will not be supported in future releases. Applications that are based on the Version 3.0 framework cannot use the new features that are available in Version 3.5, such as allowing users to customize the layout of the user interface, create rule-based alerts, or exclude unimportant text from the search results.

Deprecated crawlers

The following crawlers are deprecated. You cannot create these types of crawlers in Watson Content Analytics Version 3.5:
  • Domino Document Manager
  • Exchange Server 2000 and 2003
  • Web Content Management
  • WebSphere® Portal
To collect content from these types of sources, use the following crawlers:
  • Use the Exchange Server crawler to collect content from Microsoft Exchange Server sources. See the system requirements for supported Exchange Server versions.
  • Use the Seed list crawler to collect content from IBM Connections, IBM Quickr® for WebSphere Portal, IBM Web Content Manager, and IBM WebSphere Portal sources.

Summary of enhancements

Many new functions are available for planning, administering, and using the Watson Content Analytics Version 3.5 system.
Watson Content Analytics Version 3.5 Enhancements
Architectural enhancements
64-bit crawler processes All crawlers now run as 64-bit processes on all supported operating systems. For some crawlers, restrictions apply:
  • For IBM Content Manager, the Content Manager crawler supports only Type 4 JDBC drivers. Type 2 JDBC drivers are no longer supported.
  • For IBM DB2®, you must install the 64-bit DB2 Client module on the crawler server.
  • For Lotus Notes®, if a server that you plan to crawl uses the Notes® remote procedure call (NRPC) protocol, you must install a 64-bit Domino® server on the crawler server.
Scalability and multiple-node indexing For high scalability and failover support, your Watson Content Analytics system can now include multiple index servers and search servers. Collections can be shared across multiple partitions. A collection that includes multiple partitions concurrently processes the index across the multiple servers. Distributed search servers process search requests over multiple servers, and then federate the search results from other partitions.

To provide greater flexibility when you add servers to your system topology, you can select a role that combines functions. You can configure servers to support both document processing and search, and configure servers to support both indexing and search.

IBM BigInsights™ Enhancements extend indexing support to the Hadoop Distributed File System (HDFS). HDFS folders can be scanned, and references to files in the folders can be imported to the index. Unlike a crawler, this scan does not detect file modification dates. All files are renewed in the index with each scan.

This function provides a plug-in interface that runs in Map Reduce processes on Hadoop. The ability to store data in HDFS or HBase is not provided, but you can write a plug-in to implement this capability.

Social analytics and social search
New social media crawler If you have a BoardReader license, you can configure a BoardReader crawler to collect content from blogs, message board sites and forums, news sites, reviews, and videos. Because all information on social media sites is presumed to be public, secure search is not supported.
Enhanced social media crawlers The FileNet P8 crawler, SharePoint crawler, and Seed list crawler for IBM Connections can collect social data. The enhancements allow users to explore relationships between people, recommendations, tags that are associated with documents added to a collection by these crawlers.
Social search If social search is enabled for a collection, the system can aggregate information from various social networking sources and extract relationships between documents, people, and tags. For example, users can discover people who are relevant to a document; see recommendations for other documents and people that might be of interest; and drill down through a tag cloud to explore related information.
User experience
Redesigned user interfaces The content analytics miner and enterprise search applications were redesigned to provide a common user experience that matches other IBM products. The redesign includes performance enhancements, functional enhancements, and usability enhancements.
Layout customization You can easily change the appearance and widgets in the application interfaces by selecting a predefined layout option. For example, in the enterprise search application you can select a faceted search layout that includes time series analysis and correlation analysis. You can also create your own layouts based on templates such as a three-column page design or a two-row page design.

You can also customize the layout by specifying the location and size of each widget pane that you want to view in the interface. For example, you can show a facet tree in the upper right pane and a time series chart in the lower left pane. After you add a widget, you can drag it to another location as you refine your design. You can also configure the default settings for each widget, such as the maximum number of facets to show in the facet tree. You can easily share customized layouts by exporting and importing the customization layout file.

Mobile devices You can open the enterprise search application and the Dashboard view in the content analytics miner on an Apple iPad device (iOS 7 is required). Navigation features include touch scrolling, large tappable targets, and orientation awareness.
Collection modeling
Solution templates and packages To help you get started with searching and analyzing content, you can create collections that are based on predefined configuration settings and resource definitions. For example, a solution template might include settings for extracting information from text (such as dictionaries) and settings for organizing the data for retrieval (such as field mappings and categories).

A solution package can include one or more solution templates plus other configuration settings, such as whether security is enabled. A package can also specify a layout definition to control how the information is to be displayed when users query the collection. When you create a collection, you select the solution package that you want to use. If the package includes sample data, parsing and indexing begin as soon as the collection is created.

You can also create and distribute solution packages that are based on existing collections. For example, you can convert a collection to a solution template so that you can reuse resource definitions in other collections. You can also export solution packages and import them on other Watson Content Analytics servers.

Analytics and Search
RDF Triplestore Watson Content Analytics supports the Resource Description Framework (RDF), which is an Internet standard that allows structured and semi-structured data to be mixed and shared across applications and the web. Support is provided in the following ways:
  • You can build dictionaries and category trees from RDF files.
  • You can extract triples from document text and store them in an embedded triplestore database or an IBM DB2 database that you configure to support triples.
  • You can query the triplestore data by using the SPARQL query language, and analyze the statistical weight of the results in an RDF graph.
Include PEAR files in ICA Studio pipelines An ICA Studio pipeline can include annotators that are packaged as UIMA PEAR files, including PEAR files that are created outside ICA Studio. The PEAR files are run at the appropriate point in the UIMA pipeline.
Export ICA Studio pipeline to multiple collections You can select multiple collections when you export an ICA Studio pipeline to Watson Content Analytics. The collections must be of the same type and on the same server. With this enhancement, the pipeline is downloaded and installed one time, and the same field and facet information is configured on all selected collections.
Rule-based alerts To better automate analysis and integrate it into your business processes, you can configure alerts that cause actions to be taken when the specified conditions are met. For example, an alert might be triggered when the number of documents in the search results exceeds a threshold. For another example, an alert might be triggered when the trends index shown for a certain query exceeds the trends index from the last time the query was run by a certain percentage.

You can choose to receive an email notification when the alert is triggered, save the results of the alert in an XML file, or implement a custom publishing policy. For example, your custom plug-in might cause a new case to be created in IBM Case Manager.

Compound document support If a document contains multiple parts, such as attachments or content elements, you can configure the following crawlers to search and return all parts of the document as a single document in the search results:
  • Exchange Server
  • FileNet P8
  • Notes
  • SharePoint
If support for compound documents is not enabled in the crawler configuration, the parent and child documents are searched separately and returned as separate documents in the search results.
Natural language processing and search enhancements
Overlay index for excluding text To improve search quality, administrators and content analytics miner users can identify unimportant phrases and specify that the text is to be ignored by the search processes. For example, text that appears throughout a set of documents, such as "IBM Press Release", becomes meaningless if potentially all documents can be returned.

Excluded text is stored in an overlay index that an administrator applies to the main index. A query for an excluded term returns no documents and no facets. If another query returns a document that includes the excluded text, the excluded text is shown as light gray text in the document summaries.

Search quality management Through the use of enhanced natural language query processing, Watson Content Analytics more effectively extracts concepts from content. By analyzing queries in context, more intelligent query modifications can be suggested and more relevant results can be obtained. For example, documents in which a word is used as a noun might be ranked higher than documents in which the word is used as a verb.

A new dashboard in the administration console lets you configure options for managing search quality. You can configure global settings for searching content and ranking results, and configure settings for specific queries and groups of queries. For example, you can associate custom dictionaries at the server level or with specific queries, refine results by applying pattern matching rules, and refine results by applying a system text analysis engine.

Sentiment analysis For English and Japanese, deep parsing processes more precisely identify sentiment expressions by analyzing the grammatical structure of entire sentences, including the ability to parse predicates and arguments in conversational context. Additional enhancements extend support for shallow sentiment analysis to the following languages: Chinese, Czech, Dutch, Hebrew, Russian, Spanish, and Turkish.
Named entity recognition for Chinese The Named Entity Recognition (NER) annotator includes enhancements for the Chinese language. Improvements include performance, enhanced part-of-speech analysis, and the ability to add and block entities by configuring the annotator in the administration console.
Additional language support Enhancements were made to support Korean and Turkish in content analytics collections. ICA Studio was also enhanced to support these languages.

In enterprise search collections, enhancements were made to support Thai and Turkish dictionary-based segmentation.