Insights in visualizations for two continuous fields
Insights for two continuous fields are available when a visualization involves two continuous fields and an optional categorical group or points field.
Overview
Use visualizations such as scatter plot for the two continuous fields, possibly sliced by categories of the group field. The main goal is to detect any relationship between the continuous field and include the categorical group field as well. The results contain predictive strength of discovered relationship, the relationship description provided by fit lines, and any points with large discrepancy from the fit lines as meaningful differences.
Algorithms
IBM® Cognos Analytics computes multiple regression models that involve one of the continuous fields as the response, and the other continuous fields as an explanatory field. The optional categorical group field is used as a model factor. In addition to additive model contributions corresponding to the explanatory field, Cognos Analytics considers the square of the explanatory field and any interaction terms that include a factor. A regression model that provides an optimal fit for the data is selected from a number of possible models. The corresponding fit line is derived from a linear or quadratic model. In the case where an optional categorical group field has been supplied, it can produce a different line or quadratic curve for each category of the factor. A factor with up to three categories is currently considered in order not to overload the visualization.
Each point in a visualization represents a number of rows in the data and it is defined by the Points field. Corresponding row counts that are based on the response field define frequency weights that are used for building the regression models. Regression weights are used independently of frequency weight when Cognos Analytics computes the regression models.
Details
- Two continuous fields
-
When Cognos Analytics applies multiple linear regression for two continuous fields, one is chosen as the response and the other as an explanatory field in the model. Cognos Analytics considers both linear and quadratic model terms. If the quadratic model is significant based on the F test and its relative predictive strength improvement is more than 10% over the linear model, Cognos Analytics reports its predictive strength and displays the quadratic curve based on the computed model. This curve displays predicted values of the response based on the corresponding values of the explanatory field. Otherwise, linear predictor model is considered. If it is significant and its predictive strength is larger than 10%, Cognos Analytics reports its predictive strength and displays a line representing the predicted values of the response field based on the corresponding explanatory values. If the linear model does not qualify, the mean is reported as the fit line and no relationship is reported between the two continuous fields.
When linear or quadratic relationship is detected, Cognos Analytics also inspects the differences between predicted and observed values of the response field. These differences are called residuals and Cognos Analytics conducts studentized residuals test to detect outliers. Points with large departure from the discovered relationships are displayed under meaningful differences in the corresponding chart.
- Restrictions
-
The following table describes the conditions that determine whether insights are suggested for this algorithm.
Response Explanatory Group Weight Points Insight Exactly 1 Summarization level = any
ContinuousExactly 1 Continuous
N/A Optional Continuous
Optional Any
Predictive strength
Fit line
Meaningful differences
- Categorical group field
-
When a categorical group field is specified in addition to two continuous fields, it is used as a factor in the multiple linear regression where one of the two continuous fields is chosen as the response field and the other as an explanatory field. Cognos Analytics considers linear and quadratic model terms for continuous explanatory combined with contributions from the factor. If the quadratic model or linear model that include the factor is significant based on the F test and its relative predictive strength improvement is more than 10% over the linear model with continuous explanatory only, Cognos Analytics generates four extra models. These models include all possible interactions of continuous explanatory and factor. A model with the maximum adjusted R-squared that is also significant is selected as the final model. It is used to create a fit line for each category of the categorical predictor. Otherwise, the linear model with continuous explanatory is tested for significance and reported if its predictive strength is greater than 10%. If the linear model does not qualify, no reliable relationship among fields is established and the overall mean is reported as the fit line.
When a reliable relationship is detected, Cognos Analytics also checks for difference between the predicted and observed values of the response field. Cognos Analytics conducts studentized residuals test to detect outliers and display them under meaningful differences in the corresponding chart.
- Restrictions
-
The following table describes the conditions that determine whether insights are suggested for this algorithm.
Response Explanatory Group Weight Points Insight Exactly 1 Summarization level = any
Continuous
Exactly 1 Continuous
Exactly 1 Categorical
Optional Any
Optional Any
Predictive strength
Fit line
Meaningful differences
- Regression weights field
-
An optional continuous field can be used to specify regression weights for the model. Regression weight for an available value corresponds to influence of the observation on the computed model parameters.