IBM Content Analytics with Enterprise Search, Version 3.0.0                  

Example: Exploring the intersection of two facets

In this scenario, you want to investigate the correlation between car models and incidents of burning.

The Facet Pairs view helps you to identify a high correlation of facet values from the selected facets. The content analytics miner requires two sets of search results to calculate a correlation. Accordingly, you select two facets that represent the two search result sets of the document set.

The following scenario is based on public data from the National Highway Transportation Safety Administration (NHTSA). An IBM® Content Analytics with Enterprise Search administrator created a crawler to add this data to a content analytics collection and defined facets to classify the data for quick retrieval.

To explore statistics for a pair of facets:

  1. Click the Facet Pairs tab to open the Facet Pairs view.
  2. In the Facet Navigation area, select Model as the facet to explore as rows in a two-dimensional view. To specify the second facet, select Burn as the facet to explore as columns.

    The Facet Pairs view shows how these two facets correlate. The degree of correlation is color-coded as shown in the Correlation Amount key. Yellow indicates the least amount of correlation. Orange indicates a moderate amount of correlation. Red indicates the highest amount of correlation. It is important to focus on the intersections that have the highest correlation values.

  3. To expand the area that shows statistics, click the arrow on the border between the facet pairs table and the facet navigation tree. When you view the data for a facet pair, you have a choice of three grid- and table-based views.
  4. Click the Table view icon to view the data as a table. By default, the facets are sorted by frequency. Click the column headers to sort by correlation value or by facet values.
  5. To filter rows in the Table view, type the filter criteria in the Filter Rows field. For example, to see data for just Ford Explorers in the table, type expl in the Filter Rows field. The view is dynamically updated to show only the rows with facet values that contain expl.
  6. To further refine the filter, specify filter criteria for the columns in the Table view by typing characters in the Filter Columns field. For example, to explore the Explorer data that also contains the terms flame or fire, type f. The table is dynamically redrawn to show only the facet values that meet both the Filter Rows and Filter Columns criteria.
    Tip: To remove a filter, clear the field that corresponds to the filter that you want to remove.
  7. To quickly identify the highly correlated intersections among all of the data, click the Bird's eye view icon. Click and drag the blue square in the upper left corner of the view to highlight different data. For example, drag the blue square until you locate areas that contain orange and red cells, which indicate a high degree of correlation. Hover over a cell to see the frequency and correlation statistics for that intersection, such as statistics for the intersection of Expedition and flame.
  8. To view the area that you selected in greater detail, click the Grid icon. Only the data in the area that you highlighted in the Bird's eye view is displayed in the Grid view. By default, the Grid view shows only a 15 x 15 celled area of the possible 100 x 100 table at a time.

    You can see the comparison values in table form by row and column, one facet for each dimension. For example, you can see the intersection of Expedition and Burn, including all of the facet values that are associated with Burn, such as fire, smoke, flame, and fire hazard. The intersection of highly correlated facet values are highlighted in shades of orange and red.

    In the cell where Expedition and flame intersect, you can see two numbers. The top number is the frequency, which is the number of documents that contain both facet values (Expedition and flame). The second number is the correlation value. You can also see the frequency of each facet value (the number of documents that are returned by the facet value). For example, you might see the number 3318 displayed under the flame facet value and the number 4568 displayed under the Expedition facet value.

After you discover highly correlated pairs of facet values, you can go to the Documents view and look at the textual data to glean additional insight.


Feedback

Last updated: May 2012

© Copyright IBM Corporation 2004, 2012.
This information center is powered by Eclipse technology. (http://www.eclipse.org)