Relationships

Relationships visualizations in an exploration are displayed initially when you specify data for exploration.

Overview

IBM® Cognos Analytics provides a quick overview of relationships among pairs of fields that focus on a single field of interest. Visualization comprises multiple tabs, each for a different field of interest. This information is very useful in orienting you regarding a multitude of relevant relationships available in data to be explored further as needed.

Algorithms

While the initial field of interest is determined based on semantic data analysis, you can specify a different field of interest. Each tab provides a network graph with fields as nodes and links between pairs of nodes that represents the relative strength of the relationship between nodes. While links from the field of interest dominate the graph, other related pairs of fields with strong relationships are displayed as well. You can adjust a slider to view larger or smaller number of nodes in the network.

Details

Data for analysis

Relationships use non-summarized data to compute strength of relationship among all pairs of fields considered. To standardize the measure of relationship strength and make it comparable across all pairs of fields, all numeric fields are binned as the first step. All fields in the data are treated as categorical. The binning that is applied is equal frequency binning generating five bins. More details are available in the section on data preparation for numeric fields.

Relationship strength

Data for each pair of categorical fields is first tabulated for all combination of field categories that are found in the data. Based on the tabulated data, IBM Cognos Analytics applies the chi-square test of independence to assess whether the fields are independent. If the independence departure is significant, Cognos Analytics computes the effect size based on the chi-square statistic. This is Cramer’s V that is widely used as a measure of association between two categorical fields. The values of this measure are in the range 0 - 1 and Cognos Analytics reports the relationship strength value that is expressed as a percentage. The relationships with strength less than 10% are not reported as they are considered too weak to be of practical value.

Performance limitations

Computing relationship strength between all pairs of fields in the data set is prohibitive for large data. Cognos Analytics limits the number of processed fields to 100 to be able to provide a quick answer. However, these fields are selected by another process and the possible loss of relevant relationships is minimized. If the data contain more than 10,000 rows, Cognos Analytics obtains a random sample of this size for performance reasons. This data size ensures minimal loss in accuracy of the relationship strength estimate.