Relationships
Relationships visualizations in an exploration are displayed initially when you specify data for exploration.
Overview
IBM® Cognos Analytics provides a quick overview of relationships among pairs of fields that focus on a single field of interest. Visualization comprises multiple tabs, each for a different field of interest. This information is very useful in orienting you regarding a multitude of relevant relationships available in data to be explored further as needed.
Algorithms
While the initial field of interest is determined based on semantic data analysis, you can specify a different field of interest. Each tab provides a network graph with fields as nodes and links between pairs of nodes that represents the relative strength of the relationship between nodes. While links from the field of interest dominate the graph, other related pairs of fields with strong relationships are displayed as well. You can adjust a slider to view larger or smaller number of nodes in the network.
Details
- Data for analysis
-
Relationships use non-summarized data to compute strength of relationship among all pairs of fields considered. To standardize the measure of relationship strength and make it comparable across all pairs of fields, all numeric fields are binned as the first step. All fields in the data are treated as categorical. The binning that is applied is equal frequency binning generating five bins. More details are available in the section on data preparation for numeric fields.
- Relationship strength
-
Data for each pair of categorical fields is first tabulated for all combination of field categories that are found in the data. Based on the tabulated data, IBM Cognos Analytics applies the chi-square test of independence to assess whether the fields are independent. If the independence departure is significant, Cognos Analytics computes the effect size based on the chi-square statistic. This is Cramer’s V that is widely used as a measure of association between two categorical fields. The values of this measure are in the range 0 - 1 and Cognos Analytics reports the relationship strength value that is expressed as a percentage. The relationships with strength less than 10% are not reported as they are considered too weak to be of practical value.
- Performance limitations
-
Computing relationship strength between all pairs of fields in the data set is prohibitive for large data. Cognos Analytics limits the number of processed fields to 100 to be able to provide a quick answer. However, these fields are selected by another process and the possible loss of relevant relationships is minimized. If the data contain more than 10,000 rows, Cognos Analytics obtains a random sample of this size for performance reasons. This data size ensures minimal loss in accuracy of the relationship strength estimate.